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Quantum mechanics and information theory are among 
the most important scientific discoveries of the last century. 
Although these two areas initially developed separately it has 
emerged that they are in fact intimately related. In this re- 
view I will show how quantum information theory extends tra- 
ditional information theory by exploring the limits imposed 
by quantum, rather than classical mechanics on information 
storage and transmission. The derivation of many key results 
uniquely differentiates this review from the "usual" presenta- 
tion in that they are shown to follow logically from one cru- 
cial property of relative entropy. Within the review optimal 
bounds on the speed-up that quantum computers can achieve 
over their classical counter-parts are outlined using informa- 
tion theoretic arguments. In addition important implications 
of quantum information theory to thermodynamics and quan- 
tum measurement are intermittently discussed. A number of 
simple examples and derivations including quantum super- 
dense coding, quantum teleportation, Deutsch's and Grover's 
algorithms are also included. 
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I. INTRODUCTION 

Quantum physics not only provides the most complete 
description of physical phenomena known to man, it also 
provides a new philosophical framework for our under- 
standing of nature. It enables us to accurately model 
microscopic systems such as quarks and atoms to large 
cosmic objects such as black holes. Information theory, 
on the other hand, teaches us about our physical ability 
to store and process information. Without a formalised 
information theory many of the recent developments in 
telecommunications, computer science and engineering 
would simply not have been possible. Although quan- 
tum physics and information theory initially developed 
separately, their recent integration is seen as yet another 
important step towards understanding the fundamental 
properties and limitations of nature. 

One of the central information theoretic concepts in 
science is that of distinguishability. Inevitably an ani- 
mal's survival depends on its ability to distinguish a mate 
from a predator or a prey. In the same way, physical ex- 
periments aim to be sensitive enough to be able to dis- 
tinguish one hypothesis from another. It is however no 
surprise that the influence of the concept of distinguisha- 
bility is felt far beyond science. Life consists of a series of 
decisions that have to be made. This we do, consciously 
or unconsciously, by evaluating all the alternatives and 
distinguishing consequences of various different alterna- 
tive actions. 

The purpose of this review is to show that the appar- 
ently simple concept of distinguishability is at the root 
of information processing. Ultimately, how well we can 
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distinguish different physical states determines how much 
information we can encode into a certain system and how 
quickly we can manipulate it. Distinguishability in turn 
is completely dependent upon the laws of physics, and 
quantum physics naturally allows for more versatile in- 
formation processing than classical physics. The reason- 
ing behind this is that unlike with classical states, two 
different quantum states are not necessarily fully distin- 
guishable. It is interesting to note that although this at 
first seems like a limitation, it in fact presents us with 
significantly more possibilities for information encoding 
and transmission. 

In this review I first plan to argue that the relative en- 
tropy is the most appropriate quantity to measure distin- 
guishability between different states. The proper frame- 
work to talk about states is, of course, quantum mechan- 
ics, so it is necessary to define the quantum relative en- 
tropy. I prove that the relative entropy, both classical 
and quantum, does not increase with time. Thus two 
states can only become less distinguishable as they un- 
dergo any kind of evolution. This result will be central 
to my review as subsequent results will follow from this 
simple fact. 

I then go on to show that the 'no increase of rela- 
tive entropy' principle tells us about the ability of quan- 
tum states to store and process information. Information 
has to be encoded and manipulated in physical systems. 
Therefore, distinguishability of different states within a 
physical system is a prerequisite. Looking at this from 
the point of view of communication, what does it mean 
to send and receive a message? Sending a message suc- 
cessfully means encoding the information we wish to send 
into a structured format which the receiver must be able 
to unambiguously distinguish. The communication ca- 
pacity can then be thought of to be the rate at which 
we can send and receive messages. The rate of successful 
transmission is determined by the relative entropy be- 
tween various encoding states. 

What is less obvious, but nonetheless equally true, is 
that computation can also be viewed as a special kind 
of communication. This will allow the use of the relative 
entropy to quantify the efficiency (i.e. speed) of quantum 
computation in general. 

The role of measurement within quantum mechanics 
and therefore information theory is paramount. Classi- 
cally the measurement process is implicit because phys- 
ical quantities have well defined pre-existing properties. 
For example, a classical bit is either in the state or 
1, whereas a quantum bit can exist in a combination of 
the two states. At the end, a measurement is necessary 
to " collapse" this combination to a classical result which 
we can then read. The very concept of efficiency of a 
measurement can also be quantified using the relative 
entropy. A measurement, like a communication process, 
creates correlations between a system and an apparatus 
with the purpose of the apparatus receiving an amount 
of information from the system. The opposite of this 
process, namely, deleting of information, can be seen to 



be at the root of irreversibility and this invariably con- 
tributes to an increase in the entropy of the environment. 
This amount is exactly quantified using the relative en- 
tropy between the environmental state and the apparatus 
state and provides an exciting link between information 
theory, computation, thermodynamics and quantum me- 
chanics. But, before we reach this exciting stage, our 
long journey has to begin with a much simpler question: 
how do we quantify uncertainty in a physical state? 

II. RELATIVE ENTROPY 

Fundamental to our understanding of distinguishabil- 
ity is the measure of unceHainty in a given probability 
distribution. This uncertainty can be quantified by in- 
troducing the idea of "surprise" . Suppose that a certain 
event happens with a probability p. Then we would like 
to quantify how surprised we are when that event does 
happen. The first guess would be 1/p: the smaller the 
probability of an event, the more surprised we are when 
the event happens and vice versa. However, an event 
might be composed of two independent events which hap- 
pen with probabilities q and r respectively, so that the 
probability of both events occuring is p = q x r. We 
would now intuitively expect that the surprise of p is the 
same as the surprise of q plus the surprise of r. But, 
1/p 1/q + l/r, so that 1/p is not really a satisfactory 
definition from this perspective. Instead, if we define 
surprise as ln(l/p), then the above property called addi- 
tivity is satisfied since — lnpip2 = — Inpi —hip2- With 
a probability distribution J^nPn = 1) total uncer- 
tainty is just the average of all the surprises. Additivity 
of uncertainties of statistically independent events is such 
a stringent condition that it basically leads to a unique 
measure (Shannon and Weaver, 1949) up to a constant 
and logarithm base. 

Definition. The uncertainty in a collection of possi- 
ble states tti with corresponding probability distribution 
p{ai) is given by its entropy 

S{p)~-J2pi^i)^Mai) (1) 

i 

called the Shannon entropy. We note that there is no 
Boltzmann constant term in this expression, as there is 
for the physical entropy, since it is by convention set to 
unity. This measure is suitable for the states of systems 
described by the laws of classical physics, but it will have 
to be changed; along with other classical measures when 
we present the quantum information theory. 

I ultimately wish to be able to talk about storing and 
processing information. For this we require a means of 
comparing two different probability distributions, which 
is why I introduce the notion of relative entropy (first in- 
troduced by Kullback and Leibler, 1951). Suppose that a 
collection of events has the probability distribution {pi}, 
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but we mistakingly think that this probabihty distribu- 
tion is {qi}. For example, we have a coin which we think 
is fair, i.e. the probabihty of a head or a tail is equal. 
If we toss this coin n times, on average we expect heads 
half of the time and tails the other half. In reality, the 
coin by virtue of its uneven weight distribution will not 
be completely fair so that our expectation will turn out 
to be wrong. There will consequently be a discrepancy 
between our expected and real probability distribution. 
This discrepancy is very frequently the case in real life 
and it is, in fact, very rare that we have complete infor- 
mation about any event. Therefore wc can formalise that 
when a particular outcome j happens, we associate the 
surprise — Ing'j with it. The average surprise, or infor- 
mation, according to this erroneous belief, is 

- ^Pi^T^Qi ■ 

Since events happen with probabilities {p,} (in spite of 
our belief!) these are the correct ones to feature in the 
averaging process. However, the real amount of infor- 
mation we are obtaining is, as defined before, given by 
the Shannon entropy S{p) = —J^iPi^^Pi- I* is not so 
difficult to show that S{p) < — Y^^Pi Inf^i (equality holds 
if and only if pi = qi for all i) so that there is an "un- 
certainty deficit" as it wore stemming from our wrong 
assumption and is equal to the difference between the 
two averages. This deficit quantity is called the relative 
entropy. 

Definition. Suppose that we have two sets of discrete 
events and bj with the corresponding probability dis- 
tributions, p(ai) and p{bj). The relative entropy between 
these two distributions is defined as 

5(p(a)|b(6)):=5^p(a,)lngl|. (2) 

This function is a measure; of the 'distance' between 
p{ai) and p{bj), even though, strictly speaking, it is not 
a mathematical metric since it fails to be symmetric 
S{p{a) \ \p{b)) S{p{b) \ \p{a)). This is interesting since 
at first it looks as if there should be no difference between 
mistaking the probability distribution Pi for qi, or vice 
versa. Intuitively this can be explained using our coin 
example. Suppose that someone gives us a coin which is 
either fair or completely unfair, e.g. always gives heads. 
Now we have to toss this coin a number of times and infer 
which of the two coins we have. If we toss the fair coin 
and obtain tails, then our inference will immediately be 
that we have the fair coin. If, however, we obtain heads, 
then it could be either coin. By tossing more times, the 
fair coin would eventually give us a tails. If, however, 
we were holding the completely unfair coin from the be- 
ginning, then even after 100 heads we can never really 
eliminate the fair coin since this outcome is statistically 
possible (although highly unlikely). Therefore how cer- 
tain we are about which coin we hold is clearly dependent 



on whichever coin we hold and how different it is to the 
other one. As wc will sec shortly our uncertainty is quan- 
tified by the relative entropy and it is thus to be expected 
that it is asymmeric. I now describe this statistical ap- 
proach in more detail. 

A. Statistical significance 

A more operational interpretation of both the Shannon 
entropy and the relative entropy comes from the statisti- 
cal point of view. The generalization of this formalism to 
the quantum domain will be presented in the next section 
and we will offer an operational interpretation of the mea- 
sures of quantum correlations to be introduced therein. I 
now follow the approaches of Cover and Thomas (Cover 
and Thomas, 1991), and Csiszar and Korner (Csiszar and 
Korner, 1981) and the reader interested in more detail 
should consult these two books. 

Let Xi,X2, ■■■Xn be a sequence of n symbols from an 
alphabet A = {oi, 02, a|^|}, where \A\ is the size of 
the alphabet. We denote a sequence Xi,X2, ■■■,Xn by 
a;" or, cquivalontly, by x. The type Px of a seqiience 
Xi,X2, ■■■,Xn will be called the relative proportion of oc- 
curences of each symbol of A, i.e. Px(a) = N{a\x.)/n for 
all a G A, where A'^(a|x) is the number of times the sym- 
bol a occurs in the sequence x e A". Thus, according 
to this definition the sequences 011010 and 100110 are 
of the same type. Vn will denote the set of types with 
denominator n. If P G Vn, then the set of sequences of 
length n and type P is called the type class of P, denoted 
by T{P), i.e. mathematically 

T{P) = {x e A" : Px = P} • 

We now approach the first theorem about types which is 
at the heart of success of this theory and states that the 
number of types increases only polynomially with n. 
Theorem 1. 

|?'„|<(n + 1)1^1 

Proof of this is left for the reader, but the rationale is sim- 
ple. Suppose that wo generate an n-string of Os and Is. 
The number of different types is then n -|- 1, i.e. polyno- 
mial in n: the zeroth type has only one string - all zeros, 
the first type has n strings - all strings containing exactly 
one 1, the second type has n{n — l)/2 strings - all those 
containing exactly two Is, and so on, the nth type has 
only one sequence - all ones. The most important point is 
that the number of sequences is exponential in n, so that 
at least one type has exponentially many sequences in its 
type class, since there are only polynomially many dif- 
ferent types. A simple example is a coin tossed n times. 
If it is a fair coin, then we expect heads half of the time 
and tails other half of the time. The number of all pos- 
sible sequences for this coin is 2" (i.e. exponential in n) 
where each sequence is equally likely (with probability 
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2 "). However, the size of the type class where there is 
an equal number of heads and tails is (*^^ number 
of possible ways of choosing n/2 element out of n ele- 
ments), the log of which tends to n for large n. Hence 
this type class is in some sense asymptotically as large as 
all the type classes together. 

We now arrive at a very important theorem for us, 
which, in fact, presents the basis of the statistical inter- 
pretation of the; Shannon entropy and relative entropy. 
Theorem 2. If Xi,X2, ...Xn are drawn according to 
Q{x), then the probability of x depends only on its type 
and is given by 

Q"(x) = e-"('^(^'')+'^(^='ll'3)) 

Proof. 

n 

Q"(x) = []g(xi)= n W^'"'"^ 

1=1 aeA 



aeA 



= exp |n In + P^{a) lnP^(a)| 

^ g-n(S(P.)+S(P.||Q)) ^ 

Therefore a probability of a sequence becomes exponen- 
tially small as n increases. Indeed, our coin tossing exam- 
ple shows this: a probability for any particular sequence 
(such as e.g. 0000011111) is 2^" (note: the reason that 
we are using e in our theorems instead of 2 is because we 
are also using In instead of log). This is explicitly stated 
in the following corollary. 
Corollary. If x is the type class of Q, then 



Q"(x) 



-nS{Q) 



The proof is left to the reader. 

So, as n gets large, most of the sequences become typi- 
cal and they are all equally likely. Therefore the probabil- 
ity of every typical sequence times the number of typical 
sequences has to be equal to unity in order to conserve 
total probability (e~""^('3)j\r =1). Prom this we can see 
that the number of typical sequences is TV = e"'^'^'^) (we 
turn to this point more formally next). Hence, the above 
theorem has very important implications in the theory of 
statistical inference and distinguishability of probability 
distributions. To see how this comes about we state two 
theorems that give bounds on the size of and probabil- 
ity of a particular type class. The proofs follow directly 
from the above two theorems and the corollary (Cover 
and Thomas, 1991; Csiszar and Korner, 1981). 
Theorem 3. For any type P e ■p„, 



(n + 1)1^1 



This theorem provides the exact bounds on the number of 
"typical" sequences. Suppose that we have a probability 



distribution pi and p2 for heads and tails respectively 
and we toss the coin n times. The typical (most likely) 
sequence will be the one where we have pin heads and 
P2n tails. The number of such sequences is 



Pin 



{pin)l{p2n)l 



~ e 



n{—pi Inpi — p2 lnp2 



i.e. an exponential in n (more tosses, more possibilities) 
and entropy (higher uncertainty, more possibilities). The 
next theorem offers a statistical interpretation to the rel- 
ative entropy. 

Theorem 4. For any type P GPn, and any distribution 

Q, the probability of the type class T{P) under Q" is 
g-nS(P||Q) £j,g^ order in the exponent. More precisely, 

-nS(PIIQ) < Q"(T(P)) < e-"^(^ll«) 



(n + l)l 



The meaning of this theorem is that if we draw results 
according to Q the probability that it will "look" as if 
was drawn from P is exponentially decreasing with n and 
relative entropy between P and Q. The closer Q is to P 
the higher the probability that their statistics will look 
the same. Alternatively, the higher the number of draws, 
n, the smaller the probability that we will confuse the 
two. We present an explicit example below. The above 
two results can be succinctly written in an exponential 
fashion that will be useful to us as 



|T(P)| 
0"(T(P)) 



-nS(P) 
-nS{P\\Q) 



(3) 

(4) 



The first statement also leads to the idea of data com- 
pression, where a string of length n generated by a source 
with entropy S can be encoded into a string of length 
nS. The second statement says that if wc are performing 
n experiments according to distribution Q, the proba- 
bility that we will get something that looks as if it was 
generated by distribution P decreases exponentially with 
n depending on the relative entropy between P and Q. 
This idea immediately leads to Sanov's theorem, whose 
quantum analogue will provide a statistical interpreta- 
tion of one measure of entanglement presented in section 
IV. Now we present examples of data compression and 
introduce Sanov's theorem. 

Classical data compression. Suppose that we have 
a binary source generating O's with twice as big a prob- 
ability as that of I's, so that the Shannon entropy is 
5' = In 3 — 2/3 In 2 = 0. 64. Imagine that we have a string 
of 15 digits coming out of this source. Then, according to 
the above considerations (eq. (3)) , the most likely type 
will be the one with ten O's and five I's. But the size 
of this class is only 0.64 x 15 ~ 10. So we can use only 
10 digits to encode all the above sequences of 15 num- 
bers just by assigning the following conventional map- 
ping: the first sequence of 15 numbers is to be encoded 
in 0000000000, the second sequence is to be encoded in 
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0000000001, ... , the e-'^^th sequence is to be encoded in 
1111111111. This encoding is for obvious reasons called 
data compression. This, in fact, offers a statistical reason 
for employing the Shannon entropy as a measure of un- 
certainty. This result is known as Shannon's lower bound 
(or Shannon's First Theorem) on data compression, i.e. 
a message cannot be compressed per bit to less than its 
Shannon entropy (Shannon and Weaver, 1949). There 
arc a number of different methods used for compression 
each with varying degree of success dependent on the sta- 
tistical distribution of the message, see e.g. Cover and 
Thomas (1991). 

Now we look at the distinguishability of two proba- 
bility distributions. Suppose we would like to check if 
a given coin is "fair", i.e. if it generates a "head-tail" 
distribution of f — (1/2,1/2). When the coin is bi- 
ased then it will produce some other distribution, say 
uf = (1/3,2/3). So, our question of the coin fairness 
boils down to how well we can differentiate between two 
given probability distributions given a finite, n, number 
of experiments to perform on one of the two distribu- 
tions. In the case of a coin we would toss it n times and 
record the number of O's and I's. From simple statis- 
tics (Cover and Thomas, 1991) we know that if the coin 
is fair than the number of O's, N{0), will be roughly 
n/2 — < iV(0) < n/2 -f ^/n, for large n, and the 
same for the number of I's. So if our experimentally de- 
termined values do not fall within the above limits the 
coin is not fair. We can look at this from another point of 
view which is in the spirit of the method of types; namely, 
what is the probability that a fair coin will be mistaken 
for an unfair one with the distribution of (1/3, 2/3) given 
n trials of the fair coin? For large n the answer is given 
in the previous subsection 

p(fair ^ unfair) = e-"'^*"-^!!/) , 

where Sci{uf\\f) = 1/3 In 1/3 -|- 2/3 In 2/3 - 1/3 In 1/2 - 
2/3 In 1/2 is the Shannon relative entropy for the two 
distributions. So, 

p(fair ^ unfair) = 3"2-t" , 

which tends exponentially to zero with n ^ oo. In fact 
we see that already after ^ 20 trials the probability 
of mistaking the two distributions is vanishingly small, 
< 10" This leads to the following important result 
(Sanov, 1957). 




FIG. 1. The concept of distinguishability is illustrated. 
What do we mean by the distance from the cyclist to the 
city in the figure? It is defined as the distance from the cy- 
clist to the closest house in the city. Also, which distance 
measure should be chosen to appropriately measure this? In 
the text I argue that when it comes to distinguishing between 
two or more probability distributions the most appropriate 
measure is the relative entropy. 



Sanov's theorem. If we have a probability distribu- 
tion Q and a set of distributions E cV then 

^ e-nS(p*||Q) 

where P* is the distribution in E that is closest to Q 
using the Shannon relative entropy (sec Fig. 1). 

This can also be rephrased in the language of distin- 
guishability: when we are distinguishing a given distri- 
bution from a set of distributions, then what matters is 
how well we can distinguish that distribution from the 
closest one in the set (see Fig. 1). When we turn to 
the quantum case later, the probability distributions will 
become quantum densities representing various states of 
a quantum system, and the question will be how well we 
can distinguish between these states. Note that we could 
also talk about Q coming form a set of states in which 
case wc would have S{P\\Q*), Q* being the state that 
minimizes the relative entropy (i.e. the closest state). 

B. Other information measures from relative entropy 

Another important concept derived from the relative 
entropy concerns gathering information. When one sys- 
tem learns something about another one, their states be- 
come correlated. How correlated they are, or how much 
information they have about each other, can be quanti- 
fied by the mutual information. 

Definition. The Shannon mutual information between 
two random variables A and B, having a joint probabil- 
ity distribution p{ai,bj), and therefore marginal prob- 
ability distributions p{ai) = p{ai,bj) and p{bj) = 
^iP{ai,bj), is defined as 

Is{A : B) S{p{a)) + S{p{b)) - S{p{a, b)) . (6) 

We now present two very instructive ways of looking at 
this quantity, which will form a basis for the review. 
Mathematically, Is can be written in terms of the Shan- 
non relative entropy. In this sense it represents a distance 
between the distribution p{a, b) and the product of the 
marginals p{a) x p{b). As such, it is intuitively clear 
that this is a good measure of correlations, since it shows 
how far a joint distribution is from the product one in 
which all the correlations have been destroyed, or alter- 
natively, how distinguishable a correlated state is form a 
completely uncorrelated one. So, we have 
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IsiA:B) = S{p{a,b)\\p{a)xp{b)) . 




FIG. 2. The Venn diagram representation of the joint 
Shannon entropy of two random variables as well as the 
marginal Shannon entropies. It is clear that geometrically 
the Shannon mutual information is obtained by summing the 
marginal entropies and subtracting the total entropy. It is in- 
teresting to note that its generalisation fails for three or more 
random variables. 

Let us now view this from another angle. Suppose that 

we wish to know the probability of observing bj if ai has 
been observed. This is called a conditional probability 
and is given by: 



This motivates us to introduce a conditional entropy, 
Sa{B), as: 

Sa{B) = - J2p{ai)J2PaAbj) lnpa.(&,) 

i 3 

= -Y^p{aubj)\iipa,{bj) . 

ij 

This quantity tells us how uncertain we are about the 

value of B once wc have learned about the value of A. 
Now the Shannon mutual information can be rewritten 
as 

Is{A:B) = S{B)~Sa{B)^S{A)-Sb{A) . (7) 

So, the Shannon mutual information, as its name in- 
dicates, measures the quantity of information conveyed 
about the random variable A (B) through measurements 
of the random variable B (A) . This quantity, being posi- 
tive, tells us that the initial uncertainty in B{A) can in no 
way be increased by making observations on A{B). Note 
also that, unlike the Shannon relative entropy, the Shan- 
non mutual information is symmetric (see Fig. 2). The 
following example demonstrates the symmetry of Shan- 
non's mutual information. 

Let us briefly go back to our original idea of a surprise 
to interpret the Shannon mutual information as a mea- 
sure of correlations. Suppose that one of our friends likes 



to wear socks of two colours only: red and blue. In addi- 
tion we know that her socks arc always the same colour 
and that when she gets up in the morning, she randomly 
chooses the colour, but we know that she prefers blue to 
red with the ratio 3:1. So, when we meet our friend, 
before wc have looked at the colour of her socks, wc know 
that she wears blue socks with the probability p (6) =0.75 
and red socks with the probability p{r) = 0.25. However, 
when we look at one sock and observe, say, the colour 
blue, we immediately know that the other sock must be 
blue, too. This means that the colours of her two socks 
arc correlated. So, before wc look at one of the socks, we 
arc uncertain about the colour of the other sock by an 
amount of —0.75 In 0.75 — 0.25 In 0.25. But then, when we 
look at one of them the uncertainty immediately disap- 
pears. So, we expect that the information we gain about 
one sock by looking at the colour of the other is given 
by -0.751n0.75 - 0.251n0.25. The Shannon mutual in- 
formation predicts exactly the same thing. We see that 
the largest correlations would be if p{r) = p{b) = 0.5 and 
would be In 2. This, of course, agrees with our intuitive 
notion of surprise, since then, before looking at her one 
sock, wc would be completely uncertain about the colour 
of the other sock. Therefore by observing its colour we 
obtain the largest possible amount of information (i.e. 
remove the largest possible uncertainty in this case). 

Although it will be seen that the Shannon mutual in- 
formation is a good measure of correlations between two 
random variables, its natural generalization to three and 
more random variables fails. It is easy to see that from 
three random variables the Shannon mutual information 
should be of the following form: 

Is{A :B:C) = S{A, B, C) - S{A, B) - S{A, C) - S{B, C) 
+ S{A) + 5(B) + SiC) . (8) 

However, there exist A, B, C such that Is{A : B : C) < Q 
(I leave this as an exercise for the reader), and since we 
regard the amount of correlation as being strictly posi- 
tive, this is automatically ruled out as a good measure 
of correlation. A way to side-step this difficulty is to 
define mutual information via the relative entropy as 
S{p{A, B, C)\\p{A)p{B)p{C)). This is a positive quantity 
representing the distance between the joint three random 
variables probability distribution from the product of the 
corresponding marginals. This, of course, immediately 
generalizes to any number of random variables. Next I 
show why the relative entropy and mutual information 
are also very useful from the dynamical perspective. 

C. Classical evolution and relative entropy 

The above application of relative entropy to physics via 
the concept of distinguishability might seen contrived. 
This is, however, not at all the case, and this section 
shows the great importance of the relative entropy for 
the dynamics of classical systems. A state of a physical 
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system in classical mechanics can be represented as a vec- 
tor whose entries are various probabilities for the system 
to occupy its different possible states. The evolution of 
this system is seen as the change of these probabilities 
with time. So, the evolution is a linear transformation of 
a state into another state, i.e. of a vector into another 
vector, 



where P{j\k) is the conditional probability for the sys- 
tem to change from the state k to the state j. Because 
the probability has to be conserved qj — 1), we have 
that X^fc-P(j|fc) = 1. Matrices with this simple prop- 
erty, namely that their entries are positive and columns 
sum up to 1, are called stochastic. The above can be 
generalised to continuous systems and continuous time 
evolution, but this will not be relevant for the rest of this 
review. 

A very important property of any measure that aims at 
quantifying the amount of correlations between two ran- 
dom variables (i.e two states of the same or two different 
systems in classical mechanics) is the following: if either 
or both of the variables undergo a local stochastic evo- 
lution, then the amount of correlations cannot increase 
(in fact, it usually decreases). We now prove this in the 
case of the Shannon mutual information, following an ap- 
proach similar to that given by Everett (1973) (see also 
Penrose's excellent book on Statistical mechanics; Pen- 
rose, 1973). 

First, we establish without proof two inequalities fol- 
lowing from the convex properties of the logarithmic 

functions (Everett, 1973). Lemma 1 states that entropy 
is a concave function, whereas lemma 2 states that the 
relative entropy is a convex function. 
Lemma 1. Y.iPi^i^^^Y.iPi^i < Z^j -Pjajj InXj, where 
a;, > 0, > and E»^» = l- 

Physically, this inequality means that the average uncer- 
tainty (negative of the right hand side) is less than or 
equal to the uncertainty of the average (negative of the 
left hand side); in other words, mixing probability distri- 
butions increases entropy. This is a very important prop- 
erty of entropy as a measure of uncertainty since when 
we mix probability distributions we expect to increase 
our uncertainty. 

Lemma 2. Ylii v ^' ^ X^i^^jln— , where Xi > 
and tti > for all i. 

This is just a statement of the fact that mixing decreases 
distinguishability. Note that this is in accord with the 
lemma 1, since the more mixed the probability distribu- 
tions, the less distinguishable they are. 
These two simple and self-evident statements lead to a 
very important result that the Shannon relative entropy 
between two probability distributions decreases when 
the same two undergo a stochastic process. This is a 
very satisfying property from the physical point of view. 



where two probability distributions undergoing stochas- 
tic changes, in fact, represent two evolving physical sys- 
tems. It says that two probability distributions are in 
some sense closer to each other (i.e. "harder to distin- 
guish") after a stochastic process, or analogously, that 
two physical systems become more alike. 

So, we consider a sequence of transition-probability 
matrices := Pn{i\j), where J^j^^ij = 1 fo^' h 
and < Tj" < 1. We also introduce a sequence of pos- 
itive measures (i.e. probability distributions) a" having 
the property that 

i 

Transition probabilities T tell us the probability that at 
the nth step of evolution the system will "jump" from the 
jth to the ith state. Thus constructed transition matrices 
are stochastic for all n. We further suppose that we have 
a sequence of probability distributions generated by 
the action of the above stochastic process, such that 



This is the law describing the systems evolution in time, 
and the state of the system at time n is given by the prob- 
abilities p". For each of these probability distributions 
the relative entropy 5" is defined as 



S"(p||o) :=S(p"||a") 



/ ^ 1 



We prove the following theorem: 
Distinguishability never increases. 

S^+\p\\a) < S^{p\\a). 

Proof. Expanding 5"+^(p||a) we obtain: 
S-+\p\\a)=^p]+Hn^ 



(9) 



= Vl V P"T"-} In ' 
-1 % ^-^% 1 



However, using lemma 2 we have the following inequality 



/,From the above two it follows that 

El 

a: 
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and the proof is completed □ . 

This property means that a distance between two states 
cannot increase with time if the states evolve under any 
stochastic map. The proof can be immediately special- 
ized to the cases when T is stationary, i.e. T is indepen- 
dent of n, and when T is doubly stochastic, i.e. Tij = 1 
for all j . A corollary to this important lemma is the fol- 
lowing: 

Corollary. If we take p = p{a,b), and a = p{a)p{b), 
and suppose that the stochastic process acting separately 
on A and B are uncorrelated, we see that the Shannon 
mutual information does not increase under these local 
stochastic processes (by local we mean that they act sep- 
arately on A and B). 

This is a very important, and physically intuitive, 
property of any measure of correlations; its quantum ana- 
logue will be of central importance for quantifying quan- 
tum correlations between entangled subsystems. This 
corollary, in fact, can be taken as a guidance for a "good" 
measure of correlations. We can state that any mea- 
sure of correlations has to be non-increasing under lo- 
cal stochastic processes. In other words this means that 
the only way that the system,s can become more corre- 
lated, i.e. that they gain more information about each 
other, is if they interact. Without mutual interaction 
the correlations can only decrease or at best stay the 
same. The nature of quantum local stochastic processes 
will form the physical basis for our argument in the next 
section. A condition similar to property above, but em- 
ploying quantum stochastic processes, will be a key ele- 
ment in our search for measures of entanglement. When 
we go to quantum mechanics, the notion of a probabil- 
ity distribution will be replaced by a quantum state (i.e. 
density matrix), and a stochastic process will become a 
measurement process in quantum theory. The formu- 
lation of probability theory that is most naturally gen- 
eralized to quantum states is provided by Kolmogorov 
(1950), and the quantum generalization expressing sim- 
ilarities with von Neumann's Hilbert Space formulation 
(von Neumann, 1932) can be found in Mackey (1963) (c.f. 
Holevo, 1982). However, knowledge of this approach will 
not be necessary for the rest of the review. Finally it is 
important to stress that if the local stochastic processes 
are correlated they virtually become global, and there- 
fore the correlations between the systems can increase as 
well as decrease. 



D. Schmidt Decomposition and Quantum Dynamics 

The difference between classical and qiiantiim physics 
can be seen in the fact that quantum states are described 
by a density matrix p (and not just vectors). The den- 
sity matrix is a positive semi-definite Hermitian matrix, 
whose trace is unity (representing the fact that all the 
probabilities add up to 1). An important class of density 
matrices is the idempotent one, i.e. = p. The states 



these matrices represent arc called pure states. When 
there is no uncertainty in the knowledge of the state of 
the system its state is then pure. Another important 
notion is that of a composite system. A composite quan- 
tum system is one that consists of a number of quantum 
subsystems. When those subsystems arc entangled it is 
impossible to ascribe a definite state vector to any one 
of them. The most often quoted entangled system is a 
pair of two photons, being in the "EPR" state (Einstein 
et. al, 1935; Bell, 1987). The composite system is then 
mathematically described by 

l*) = ^(IT>U> + U>IT» (10) 

where the first ket in either product belongs to one pho- 
ton and the second to the other. The property that is 
described is the direction of spin or polarization along the 
z-axis, which can either be "up" (| t)) or "down" (| |)). 
A two level system of this type is a quantum analogue 
of a bit, which we shall henceforth call a qubit. We can 
immediately see that neither of the photons possesses a 
definite state vector. The best that one can say is that if 
a measurement is made on one photon, and it is found to 
be in the state "up" for example, then the other photon 
is certain to be in the state "down" . This idea cannot be 
applied to a general composite system, unless the former 
is written in a special form. This motivates us to in- 
troduce the so called Schmidt decomposition (Schmidt, 
1907), which not only is mathematically convenient, but 
also gives a deeper insight into correlations between the 
two subsystems. 

According to the rules of quantum mechanics the state 
vector of a composite system, consisting of subsystems 
U and V, is represented by a vector belonging to the 
tensor product of the two Hilbert Spaces Hu d) Ti-v- The 
general state of this system can be written as a linear 
superposition of products of individual states: 

I*) =5IZ!^"™I"")I^'") (11) 

n m 

where {\un)}n=i and {\vm)}m=i are the orthonormal ba- 
sis of the subsystems U and V respectively, whose dimen- 
sions are dim U = N and dimV = M. This state can 
always be written in the so called Schmidt form: 

\^)=T.9nK)K), (12) 

n 

where juj^) and \v!^) are orthonormal basis for U and 
V respectively. Note that in this form the correlations 
between the two subsystems are fully displayed. If U is 
found in the state |w2) for example, then the state of V 
is 1^2). This is clearly a multi state generalization of the 
EPR state mentioned earlier. 

I will now prove this assertion by showing how to de- 
rive eq. (12) from eq. (11). To that end, let us assume 
that M > N, which in no way affects our line of argu- 
ment since the procedure is symmetric with respect to 
the subsystems. Then we have the following five steps: 
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1. First we construct a density matrix describing 
l*> = J2n Em Cnm\un)\vm)- Once the density ma- 
trix is known all the properties of the system can 
be deduced from it. Moreover, ensembles which 
are prepared differently, but have the same den- 
sity matrix are statistically indistinguishable and 
therefore equivalent. Generally, if we have a mixed 
state involving vectors |^2), • ■ • I^'d) with cor- 
responding classical probabilities rui,W2, ■ ■ ■ ,W3, 
then the density matrix is defined to be: 

D 

Since in our case |^) is a pure state, the density 
matrix is a projection operator on to |^), i.e. 

p = \^){^\ =^^p„mpq\Un){Up\ \Vm) {Vq\ 
nm pq 

where Pnmpq = CnmCpq- If we, however, wish to deal 
with one of the subsystems only, then we employ 
the concept of the reduced density matrix. 

2. We find the reduced density matrix of the subsys- 
tem U, obtained by tracing p over all states of the 
subsystem V, so that 

PU = '^{Vq\p\Vg) = Pnmpm \ Un) {Up | . 

q nm. p 

Note that the partial trace (or the trace itself) 

does not depend on the choice of basis. Partial 
tracing is analogous to finding marginal probabil- 
ity distributions from a joint probability distribu- 
tion in classical probability theory. The crucial 
step in the Schmidt decomposition is diagonaliz- 
ing the above. I shall call the eigenvalues of pjj 
|52p, • • • , IsjvI^, and the corresponding eigen- 
vectors \u[), \u'2), . . . , \u'j^). 

3. Then I re-cxprcss the above in terms of \u'}'s, i.e 

n m 

4. Now, we construct a new orthonormal basis of the 
subsystem V such that each new vector is a "clever" 
linear superposition of the old ones, so that 

K)=T.^Mvm). 
m 9' 

The matrix given by the coefficients d^^^/gi is uni- 
tary which is why the new basis is orthonormal. 



5. The Schmidt decomposition of 1^*) is now given by 
|*)=^5„|<)K). 

n 

There are two important observations to be made, 
which arc fundamental to understanding correlations be- 
tween the two subsystems in a joint pure state: 

• The reduced density matrices of both subsystems, 
written in the Schmidt basis, are diagonal and have 
the same positive spectrum, in particular, the over- 
all density matrix is given by 

P = Y^9n9mWn){'U''m\ \v'n){v'm\ 
nm 

whereas the reduced ones are 

PU = Y.{v'm\p\v'm) = E \9nfWn){Un\ 

m n 

PV = Y^{u'n\pWn) = E \9m?\v'J}{v'm\ ■ 
n m 

• If a subsystem is A'' dimensional it then can be en- 
tangled with no more than N orthogonal states of 
another one. 

I would like to point out that the Schmidt decomposi- 
tion is, in general, impossible for more than two entangled 
subsystems. To clarify this I consider three entangled 
subsystem as an example. Here, our intention would be 
to write a general state such that by observing the state 
of the one of the subsystems we instantaneously and with 
certainty know the state of the other two. But, this is 
impossible in general, for the presence of the third system 
makes the prediction uncertain. Loosely speaking, while 
we know the state of one of the subsystems, the other two 
might still be entangled and cannot have definite vectors 
associated with them (an exception to this general rule is, 
for example, a state of the Greenberger-Horne-Zeilinger 
(GHZ) type (1/V2)(| T)l T)l T) + I i)l i)l i)))- Clearly, 
involvement of even more subsystems complicates this 
analysis even further and produces, so to speak, an even 
greater mixture and uncertainty. The same reasoning 
applies to mixed states of two or more subsystems (i.e. 
states whose density operator is not idempotent p^ 7^ p), 
for which we cannot have the Schmidt decomposition in 
general. This reason alone is responsible for the fact that 
the entanglement of two subsystems in a pure state is sim- 
ple to understand and quantify, while for mixed states, 
or states consisting of more than two subsystems, the 
question is much more involved. 

We now discuss the way quantum systems evolve. An 
isolated system, of course, follows a unitary dynamics 
generated by Schrodinger's equation (non-relativistic) . 
This evolution is fully reversible (manifesting itself in the 
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fact that the quantum entropy docs not increase during 
this process as we will see below) . However, we know that 
most of the processes in Nature are irreversible (think of 
the spontaneous emission and the non-existence of its re- 
verse - "spontaneous absorption"). These processes are 
non-unitary and arise from the interaction of the sys- 
tem with the environment; thus, the system is no longer 
closed. Mathematically, the evolution of a quantum state 
is then most generally of the form (Davies, 1976) 

p' = Y,^o.pAl (13) 

where, because of the conservation of probability, or, 
more precisely, trace preservation ^^^a = 1- The 
above map is the most general completely positive (trace 
preserving) linear map (CP-map) (Choi, 1975). Positiv- 
ity means that density matrices are mapped into den- 
sity matrices (strictly speaking, positive operators are 
mapped onto positive operators). To define "complete", 
we first need to introduce the idea of an extended state. 
By extension of a state I mean any state on a larger 
Hilbert space that reduces itself to the original state 
when the extended part is traced out. In turn, com- 
pleteness means that any extension of the density matrix 
is also mapped into a density matrix. To clarify this I 
will present a few examples of CP-maps: 

• Projectors are Hermitian idempotent operators 
(pt = p and P2 ^ p) and the evolution of the 
form p PipPi is a CP-map; 

• Addition of another system to p is also a CP-map, 

p-> p®pi; 

• Let Ei > Q and Ylii = Then, p ^ -.= 
Tr{pEk) is a CP map which generates a probability 
distribution from a density matrix. 

• Unitary evolution is a special case of CP-map, 
where only one operator is present in the sum, i.e. 
UpW. 

I leave it for the reader to show that the above CP maps 
can indeed be written in the form in eq. (13). We will 
meet other examples in the next subsection. 

Remarkably not all positive maps are completely pos- 
itive, transposition being a well known example. Posi- 
tivity of transposition follows from the fact for any state 
p, its transposition p'^ > 0. However, a counter example 
to completeness comes from, for example, a singlet state 
of two sub-systems A and B. Namely, if we transpose 
only A (or B), then the resulting operator is not posi- 
tive (so that it is not a physical state), i.e. p'^ < 0. 
Confirmation of this is left as an exercise. 



/ 


u 


\ 


1 




\ 


Or 


— > — - 





FIG. 3. The most general evolution in quantum mechanics 
is represented by a completely positive trace preserving map 
(CP- map). This figure shows two equivalent ways of repre- 
senting a CP-map: (a) the canonical form A{.)A\ and (b) 
via the extension to a larger Hilbert Space He and an ap- 
propriate Unitary transformation there in. The connection is 
explained in the text. 

The reader might wonder as to what the physical im- 
plementation of the canonical form J2a AapA]^ is? I will 
now introduce another representation of the CP-maps 
that will explain its physical importance and will be cru- 
cial for the rest of the review. Loosely stated, any CP- 
map can be represented as a unitary transformation on a 
higher Hilbert space (see Fig. 3). Namely, from Schmidt 
decomposition we know that a density matrix can be rep- 
resented as a " reduction" of a state in an enlarged Hilbert 
space. Suppose that p gH and that pe ^'H® Ha is an 
"extension" of the state p such that TvaPE = P- Then a 
CP map a = $(p) can be represented as 

p^PE^ UpeU^ ^ TraiUpEU^) = u (14) 

Here we have first "lifted" p to pE, then evolved pE uni- 
tarily into UpsU'^ which, after tracing over the Hilbert 
space extension (i.e. lowering), yields the final state a as 
in Fig. 3. The fact that for any CP-map there exist a 
unitary operator U which will execute this map on some 
higher Hilbert space is guaranteed by a theorem proved 
independently by Kraus (1983) and Ozawa (1984), (see 
Schumacher, 1996 for a modern presentation). I will now 
only present a plausibility argument for this correspon- 
dence. Let pe= P® |0)(0|a where |0)(0|a e Ha- Then 

a = Tra{Up®\{)){Q\aU^) 

= Y,^\aUp®\Q){Q\aU^\i)a 

i 

= ^(z|[/|0)p(0|C7t|z), 

i 

which has the same form as eq. (13) providing we define 
Ai := (ilf/jO). Thus, given a unitary evolution on the 
extended Hilbert Space, we can always find the corre- 
sponding positive operators which describe the evolution 
of the original system. Note that the choice of the oper- 
ators is not unique. 

Finally, I would like to discuss another frequently used 
concept that is in some sense derived from the notion 
of CP-maps. It can be loosely stated that the CP-map 
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represents the cvohition of a quantum system when we 
do not update the knowledge of its state based on the 
particular measurement outcome. This is why we have a 
summation over all measurements in eq. (13). If, on the 
other hand, we know that the outcome corresponding to 
the operator A^jAj occurs, then the state of the system 

immediately afterwards is given by AjpA^j/tr{A^jAjp). 
This type of measurement is the most general one and 
is commonly referred to as the Positive Operator Val- 
ued Measure (POVM). It is positive because operators 
of the form A^ A are always positive for any operator A 
and taking the trace of it together with any density ma- 
trix generates a positive number (i.e. a probability for 
that particular measurement outcome). For a more de- 
tailed overview of POVMs see Peres (1993). The concept 
of POVM will play a significant role when defining the 
quantum relative entropy next. 

E. Quantum relative entropy 

When two subsystems become entangled the compos- 
ite state can be expressed as a superposition of the prod- 
uct of the corresponding Schmidt basis vectors. From 
eq. (12) it follows that the i-th vector of either sub- 
system has a probability of \gi\'^ associated with it. We 
are, therefore, uncertain about the state of each subsys- 
tem, the uncertainty being larger if the probabilities are 
evenly distributed. Since the uncertainty in the prob- 
ability distribution is naturally described by the Shan- 
non entropy, this classical measure can also be applied 
in quantum theory. In an entangled system this entropy 
is related to a single observable. The general state of 
a quantum system, as I have already remarked, is de- 
scribed by its density matrix p. If A is an observable 
pertaining to the system described by p, then by the 
spectral decomposition theorem A — Y^^aiPi, where Pi 
is the projection onto the state with the eigenvalue a,. 
The probability of obtaining the eigenvalue Uj is given 
by pj = Tr(pPj) = Tr{Pjp). The uncertainty in a given 
observable can now be expressed through the Shannon 
entropy. Let the observables A and B, pertaining to 
the subsystems U and V respectively, have a discrete, 
non-degenerate spectrum, with corresponding probabili- 
ties p{ai) and p{bj) of observables A being at and B being 
bj. Let also the joint probability be p{ai, bj). Then, 



^p(ai) In p(ai) 

i 


(15) 


^p{ai, bj ) In ^ p(ai , bj ) 

ij j 


(16) 


^p{bj)\np{bj) 
j 


(17) 


^p{ai, bj ) In ^ p(ai , bj ) 


(18) 



ij i 



S{A, B) = - ^p(ai, bj) \np{au bj) (19) 

where I have used the fact that ^j p{(ii,bj) = p{ai) and 
^iP{ai,bj) — p{bj). We have seen that a signature of 
correlations is that the sum of the uncertainties in the 
individual subsystems is greater than the uncertainty in 
the total state. So, the Shannon mutual information is a 
good indicator of how much the two given observables are 
correlated. However, this quantity as it is inherently clas- 
sical describes the correlations between single observables 
only. The quantity that is related to the correlations in 
the overall state as a whole is the von Neumann mutual 
information. Since it is assigned to the state as a whole, 
it is of little surprise that it depends on the density ma- 
trix. First, however, I define the von Neumann entropy 
(von Neumann, 1932), which can be considered as the 
proper quantum analogue of the Shannon entropy (Ohya 
and Petz, 1993; Ingarden et. al, 1997; Wehrl, 1978). 
Definition. The von Neumann entropy of a quantum 
system described by a density matrix p is defined as 

Sn{p) := -Tr(plnp) . (20) 

(I will drop the subscript N whenever there is no pos- 
sibility of confusion). The Shannon entropy is equal to 
the von Neumann entropy only when it describes the un- 
certainties in the values of the observables that commute 
with the density matrix, i.e. the Schmidt observables. 
Otherwise, 

S{A) > Sn{p) 

where A is any observable of a system described by p. 
This means that there is more uncertainty in a single 
observable than in the whole of the state, the fact which 
entirely contradicts our expectations. 

I now discuss a relation concerning the entropies of 
two subsystems. One part of it is somewhat analogous 
to its classical counterpart, but instead of referring to 
observables it is related to the two states. This inequality 
is called the Araki-Lieb inequality (Araki and Lieb, 1970) 
and is one of the most important results in the quantum 
theory of correlations. Let p i and be the reduced 
density matrices of subsystems A and B respectively, and 
p be the matrix of a composite system, then: 

Sn{pa) + SNiPB) > Sn{p) > \SNiPA) - Sn{pb)\ . 

Physically, the left hand side implies that we have more 
information (less uncertainty) in an entangled state than 
if the two states are treated separately. This arises nat- 
urally, since by treating the subsystems separately we 
have neglected the correlations (entanglement). We note 
that if the composite system is in a pure state, then 
S{p) = 0, and from the right hand side it follows that 
S{pa) = S{pb) (cf. Schmidt decomposition eq. (12)). To 
appreciate the extent to which this is a counter-intuitive 
result we consider the following example. Suppose a two 
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level atom is interacting with a single mode of an EM field 
as in the Jaynes-Cummings model (Jaynes and Cum- 
mings, 1963). If the overall state is initially pure, and the 
whole system is isolated then the entropies of the atom 
and the field are equally uncertain at all the times. But 
this is not expected since the atom has only two degrees 
of freedom and the field infinitely many! This, however, 
is possible, as, by the second observation, the atom, as 
a two dimensional subsystem, is only entangled with two 
dimensions of the field. 

I present without proofs two important properties of 
entropy which will be used in the later sections (Wehrl, 
1978). These are: 

1. additivity: Sn{pa ® Pb) Sn{pa) + Sn{pb); (21) 

2. concavity: Sjv A^pi^ > ^ Ai5jv(pj); (22) 

The first property is the same as in classical information 
theory, namely the entropies of independent systems add 
up. The concavity simply reflects the fact that "mixing 
increases uncertainty" . 

Following the definition of the Shannon mutual infor- 
mation I introduce the von Neumann mutual informa- 
tion, which refers to the correlation between the whole 
subsystems rather than relating two obscrvablcs only. 
Definition. The von Neumann mutual information be- 
tween the two subsystems pu and pv of the joint state 
puv is defined as 

iNiPu ■ PV, Puv) = Sn{pu) + Sn{pv) - Sn{puv) ■ (23) 

As in the case of the Shannon mutual information this 
quantity can be interpreted as a distance between two 
quantum states. For this I first need to define the von 
Neumann relative entropy, in a direct analogy with the 
Shannon relative entropy (in fact, this quantity was first 
considered by Umegaki (1962), but for consistency rea- 
sons I name it after von Neumann; I will also refer to it 
as the quantum relative entropy). 

Definition. The von Neumann relative entropy between 
the two states a and p is defined as 

5jv(ct||p) =Tr£7(ln£7-lnp) . (24) 



This measure also has the same statistical interpretation 
as its classical analogue: it tells us how difficult it is to 
distinguish the state a from the state p (Hiai and Petz, 
1991). To that end, suppose we have two states a and p. 
How can we distinguish them? Wc can chose a POVM 
'Y^^i Ai = 1 which generates two distributions via 

Pi = trA.a (25) 
q^ = trA^p , (26) 

and use classical reasoning to distinguish these two dis- 
tributions. However, the choice of POVM's is not unique. 



It is therefore best to choose that POVM which distin- 
guishes the distributions most, i.e. for which the classical 
relative entropy is largest. Thus we arrive at the follow- 
ing quantity 

S\{cr\\p) := supy^ 's {^"1 trAjaln trAja — trAialn.tr Aip}, 

i 

where the supremum is taken over all POVM's. The 
above is not the most general measurement that we can 
make, however. In general we have N copies of a and p 
in the state 

cr^ = o-(8)cr . ..(g)q; (27) 

total of N terms 
p^ = p^p . ..^p (28) 

total of N terms 

We may now apply a POVM — ^ acting on 

and p^ . Consequently, we define a new type of relative 
entropy 

Sn{(t\\p) := snpji^,g{j^^trAia^lntrAia^ 

i 

- trAia^lntrAip^} (29) 
Now it can be shown that (Donald, 1986) 

S{a\\p)>SN (30) 

where S{a\\p) is the quantum relative entropy. (This re- 
ally is a consequence of the fact that the relative entropy 
does not increase under general CP-maps, a fact that 
will be proven later on in this subsection). Equality is 

achieved in eq. (30) iff rr and p commute (Fuchs, 1996). 
However, for any a and p it is true that (Hiai and Petz, 
1991) 

'S'(fllp) = 1™ Sn . 

N—^00 

In fact, this limit can be achieved by projective mea- 
surements which arc independent of a (Hayaslii, 1997). 
From these considerations it would naturally follow that 
the probability of confusing two quantum states a and 
p (after performing N measurements on p) is (for large 
N): 

Pjv(p^ct) = e-^^('^IW . (31) 

We would like to stress here that classical statistical rea- 
soning applied to distinguishing quantum states leads 
to the above formula. There are, however, other ap- 
proaches. Some take cq. (31) for their starting point and 
then derive the rest of the formalism thenceforth (Hiai 
and Petz, 1991). Others, on the other hand, assume a 
set of axioms that are necessary to be satisfied by the 
quantum analogue of the relative entropy (e.g. it should 
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reduce to the classical relative entropy if the density op- 
erators commute, i.e. if they are "classical") and then 
derive eq. (31) as a consequence (Donald, 1986). In any 
case, as we have argued here, there is a strong reason to 
believe that the quantum relative entropy S{a\\p) plays 
the same role in quantum statistics as the classical rela- 
tive entropy plays in classical statistics (see also a review 
by Schumacher and Westmoreland, 2000). 

Now, the von Neumann mutual information can be 
understood as a distance of the state puv to the uncor- 
related state pu ® Pvi 

In{pu ■ PV, Puv) = SNipuvWpu ® Pv) ■ 

The quantum relative entropy will be the most important 
quantity in classifying and quantifying quantum correla- 
tions. It will be seen that this quantity does not increase 
under CP maps, which arc quantum analogues of the 
stochastic processes. I list three properties of the rela- 
tive entropy whose proof is left to the reader: 

Fl. Unitary operations leave S'((t||p) invariant, i.e. 
S{a\\p) = S{UaW\\UpW). Unitary transforma- 
tions represent a change of basis (i.e. a change in 
our "perspective") and the distance between two 
states should not (and does not in this case) change 
under this. 

F2. ^(TrpCrllTrpp) < S{a\\p), where Trp is a partial 
trace. Tracing over a part of the system leads to a 
loss of information. The less information we have 
about two states, the harder they are to distinguish 
which is what this inequality says. 

F3. The relative entropy is additive S{ai <T2\\pi (S> 
P2) = S'(crillpi) + S{a2\\p2)- This inequality is a 
consequence of additivity of entropy itself. 

These I now show have profound implication for the evo- 
lution of quantum systems. 

Quantum distinguishability never increases. For 

any completely positive, trace preserving map given 
by 4>cr = Y.yr<^yi and Y^Vi^i = h we have that 
S{M\^p)<S{a\\p). 

I will first present a physical argument as to why we 
should expect this theorem to hold. As I have discussed, 
a CP-map can be represented as a unitary transformation 
on an extended Hilbert space. According to Fl, unitary 
transformations do not change the relative entropy be- 
tween two states. However, after this, we have to perform 
a partial tracing to go back to the original Hilbert space 
which, according to F2, decreases the relative entropy as 
some information is invariably lost during this operation. 
Hence the relative entropy decreases under any CP-map. 
I now formalise this proof. 

Proof. I have discussed the fact that a CP-map can 
always be represented as a unitary operation-|-partial 
tracing on an extended Hilbert Space H (S) W„, where 



dimW„ = n (Lindblad, 1974; 1975). Let be an or- 
thonormal basis in Tin and \a) be a unit vector. So I 
define, 

W = Y,Vi®\^{a\ . (32) 

i 

Then, W^W = 1 (g) Pq, where Pa = \a){a\^ and there is a 
unitary operator U m.T-L® Tin such that W = U{1^ Pa) 
(Reed an Simon, 1980). Consequently, 

U{A(E)Pa)U^ =Y,ViAVj , (33) 

ij 

SO that, 

Tr2{U{A ® P„)C/t} = ^ ViAV^ . 

i 

This shows that the unitary and VipV^ representa- 
tions are equivalent. Now using F2, then Fl, and finally 
F3 we find the following 

S{Ti2{U{a(^Pa)U^ II Ti2{U{p(S,Pa)U^) 

< S{U{a ^ P„)C/t||C/(p P„)C/t) 

= S(a(g> Pa\\p(E) Pa) 

= Sia\\p) . (34) 
This proves the result □ . 

Corollary. Since for a complete set of orthonormal pro- 
jectors P, J2i Pi'^Pi is a CP map, then 

Y,S{PiaPi\\PipPi)<S{a\\p) . (35) 

i 

(The sum can be taken outside as it can be easily shown 

that S{Y,^P,aP,\\Y.^P,pP,) - Y.^S{P^oP^\\P^pP^)). 

Now from Fl, F2, F3 and eq. (35) we have the following 
Theorem 5. If cr, = V^aV^ then <S'(c7i||/9i) < ^(ctIIp), 

where p, ^ V,pV} Itriy.pV}). 

Proof. Equations (32) and (33) are introduced as in the 
previous proof. From eq. (33) we have that 

1^2{l®PiU{.A®Pa)U^\®Pi} = ViAV} . 

where Pi = |i)(t|. Now, from F2, the Corollary and F3 it 
follows that 

^ S (TV2{1 ® PiU{a ^ Pa)Uh ^ Pi} 

i 

II Tl2{l<»PiU{p<»Pa)Uh<»Pi}) 

< ^ S'(l (g P^U{a ^ Pa)Uh O Pi 

i 

II l(g)P^U{p(g)Pa)Uh(E)P,) 

< S{U{<7®Pa)U^\\U{p®Pa)U^) 
= S{a(g> Pa\\p(E> Pa) 

= S{a\\p) . (36) 
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This proves Theorem 5 □. This theorem will be impor- 
tant in the next section. A simple consequence of the fact 
that the quantum relative entropy itself does not increase 
under CP-maps is that correlations (as measured by the 
quantum mutual information) also cannot increase but 
now under local CP-maps. 

Correlations cannot increase without interaction. 

Correlations, as measured by the von Neumann mutual 
information, do not increase during local complete mea- 
surements carried on two entangled quantum systems. 

The Shannon mutual information, although having this 

desired property, does not distinguish between the quan- 
tum and classical correlations (rather, it measures total 
correlations) . In order to do this I will have to introduce 
the possibility of classical communication between A and 
B. This will allow classical correlations to increase while 
leaving quantum correlations intact, as will be seen in 
the following section. Now we put the theory developed 
so far to practical use: communication. 
Digression on the Second Law of Thermodynam- 
ics. The Second Law of Thermodynamics states that 
entropy of an isolated system never decreases. This does 
not follow directly from the no increase of the quantum 
relative entropy under CP-maps. Strictly speaking, an 
isolated system in quantum mechanics evolves unitarily 
and therefore its entropy never changes. Under CP-maps, 
on the other hand, the entropy can both increase as well 
as decrease. If, however, the state p is maximally mixed 
I/n for example, then the quantum relative entropy is 
given by: 

S{cr\\p) = lnn^ S{(j) . (37) 

If in addition the evolution is such that I/n is the equilib- 
rium state, then the monotone decrease in the quantum 
relative entropy implies a monotone increase in S{(7), just 
as in the Second Law of Thermodynamics. Otherwise, 
the entropy itself can both increase as well as decrease. 
A detailed discussion of the statistical foundations of the 
Second Law can be found in Tollman's classic "The Prin- 
ciples of Statistical Mechanics" (Tolman, 1938). 

III. QUANTUM COMMUNICATION: CLASSICAL 

USE 

The central objective of communication theory is to al- 
low a person, often referred to as Alice, to communicate 
accurately with another person, called Bob, even in the 
presence of noise. Alice encodes her message into a num- 
ber of different (distinguishable) states, with each state 
representing a different symbol in the message. For ex- 
ample, Alice encodes the bit value 1 into the excited state 
of a two level atom and sends this atom to Bob. On its 
way to Bob the atom may transform into its ground state 
due to either stimulated or spontaneous emission thereby 



giving Bob the impression that Alice transmitted 0. This 
unwanted state transition is a form of channel noise. 

The key question is: what is the largest amount of 
information (per symbol) that Alice can send to Bob, 
i.e. what is the capacity of the communication channel 
taking into account any possible noise? In classical infor- 
mation theory the capacity for communication is given 
by the mutual information between Alice's sent message 
and Bob's received message (Shannon and Weaver, 1949). 
This is intuitively clear, since mutual information quanti- 
fies correlations between sent and received messages and 
it thus tells us how faithful the transmission is. If we 
use quantum states to encode symbols, then the capac- 
ity is not given by the quantum mutual information we 
introduced before. We derive a new quantity for this pur- 
pose called the Holevo bound (Holevo, 1973). The benefit 
of performing the full quantum derivation is that this is 
a more fundamental approach to information processing. 
We can then deduce the classical capacity as a special 
case. 



A. Holevo bound 

A quantum communication channel (QCC) consists of 

a number, TV, of quantum systems prepared in states 
pi , P2 ■ • ■ Pn and whatever physical medium is used to 
send the states from Alice to Bob. These states encode 
N different symbols with certain a priori probabilities, 
Pi,P2, ■ ■ -Pn- Bob then performs a set of measurements 
to determine the correct sequence of states comprising 
Alice's symbols, which he can then use to reconstruct 
the entire message (Ingarden, 1976). If the states suf- 
fer no error on the way to the Bob, then the channel is 
called noiseless; otherwise it is called noisy. I only con- 
sider the notion of capacity of a noiseless QCC, since the 
generalization to a noisy channel is straightforward. 

Let S{p) = — Trplnp be the standard von Neumann 
entropy of a density matrix p. Then, the capacity of a 
QCC is defined as 

C := maxC({p},/>) 
{p} 

where 

C{{p}, p) = SiY^PiPi) - Y^P^^^P^) ' (38) 

i i 

is the Holevo bound. Note that the above can be ex- 
pressed succinctly as 

C{{p},p) = Y.PiS{pi\\p) , (39) 

i 

where S{ \\ ) is the von Neumann relative entropy and 
p = J2iPiPi- When there is no possibility of confusion I 
write C{{p},p) = C{{p}). The reader may ask why we 
need to maximise symbol probabilities in order to com- 
pute the capacity. This is because the channel can be 
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used with different input probabilities and the capacity 
represents the maximum that can be communicated us- 
ing this channel. 

To see the physical motivation behind this quantity 
consider A'' states p\, ...pN sent by Alice to Bob according 
to probabilities pi, ...pN respectively. Bob now performs 
a set of complete measurements J2i — where Ei > 
0, in order to determine which state was sent to him (a 
complete measurement is like a CP-map, but where we 
record each of the outcomes) . The accessible information 
to Bob is given by the mutual information between his 
measurement and pi,...pN (Holevo, 1973; Davies 1976). 
This quantity tells us how well Bob's measurement can 
distinguish between the message states and is given by 

7(E : p) = I ^ -Tr{pEi) ln(Tr(p£,)) 

+ J2pJ^^<pjE^)H^^r{p,Ei))\ 

3 

The rationale behind this expression is that the uncer- 
tainty in the message before any measurement is per- 
formed is given by the first term and the second term rep- 
resents the uncertainty after the measurement has identi- 
fied (partially in general) the message states. The Holevo 
bound is an upper bound to the above accessible infor- 
mation, i.e. 

SiY^P^P^) - Y^Pi^^Pi) ^ max J(i? : p) . (40) 

i i 

This equality is saturated if and only if [pi, pj\ — for all 
i and j. Therefore, since the Holevo hound is an upper 
hound to accessible information that Bob can gain about 
Alice 's message, we identify its maximum over all pos- 
sible initial prohahilities with the classical capacity of a 
quantum channel. 

The Holevo bound has an even more suggestive form: 
the uncertainty in the initial message is S{p), but after 
the states are correctly identified the average uncertainty 
is ^iPiS{pi). The difference between these two quantities 
when maximised over all piS is the classical communica- 
tion capacity of a quantum, channel. Note that one of the 
most profound implications of the Holevo bound is that a 
quantum bit cannot store more information than a classi- 
cal bit. In spite of this limitation, quantum information 
processing is more efficient than its classical analogue. 
This is due to the diflcerent nature of information encod- 
ing, which is reflected in the existence of superpositions of 
different states as well as entanglement between different 
qubits (see also section on dense coding). 
Proof of the Holevo bound in eq. (40). The Holevo 
bound is a direct consequence of the fact that the quan- 
tum relative entropy does not increase under CP maps as 
in Theorem 1 (note that Holevo's original proof is much 
more complicated and does not involve using the quan- 
tum relative entropy. Here I follow Yuen and Ozawa in 



spirit, as in the last reference of Holevo (1973)). One 
such map is 

t{A) = iTr(A) 

where A is any n x n positive matrix. This leads to the 
Pierls - Bogoliubov inequality (PBI) (Bhatia, 1997) 

T{A){lnT{A) - InT(S)) < t{AItiA- AlnB) (41) 

To prove the Holevo bound I first use that fact that (The- 
orem 5) 

S{pi\\p)>J2S{AjPiA]\\AjpA]) 

j 

PBI now implies that 

S{Ajp,A]\\AjpA]) > Tr{Ajp,A]){ln{Tr{Ajp,A])) 
- ln(Tr(^,Ml))} 

= p(iK)(inp(jK) - inp(i)) 

where p(j|i) = Tr{AjPiA^j} is the conditional probability 

that the message pi will lead to the outcome Ej = AjAj 
and p{j) = X]jP(j|«)- Thus we now have that 

S{pi\\p) > 5^p(i|i)(lnp(iK) - Inp(i)) 

i 

Multiplying both sides by the (positive) Pi and summing 
over all i leads to the Holevo bound □ . 

Since Holevo's result is one of the key results in quan- 
tum information theory I present another simple way 
of understanding it via the quantum mutual informa- 
tion. This, of course, is only an additional motivation 
for the Holevo bound and by no means proves its valid- 
ity. Namely, if Alice encodes the symbol (sym) i into the 
state (st) Pi, then the total state (sym + st) is 

PSym+st = y^,Pi\i){i\ Pi , 

i 

where the kets \i) are orthogonal (we can think of these 
as representing different states of consciousness of Al- 
ice!). Bob now wants to learn about the symbols by dis- 
tinguishing the states pi. He cannot learn more about 
the symbols than is already stored in the correlations be- 
tween the symbols and the message states. This as we 
know is given by the quantum mutual information 

I{psvm+st) = S{Sym) + S{St) - S{Sym + St) 

= Si^PiPi) - X]ft5(Pi) (42) 

i i 

which is the same as the Holevo bound. 

I would like now to derive the capacity of a classical 
communication channel from the Holevo bound. I fol- 
low Gordon's reasoning who was, in fact, the first person 
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to conjecture the Holcvo bound (Gordon, 1964). As I 
mentioned before, the Holevo bound itself contains the 
classical capacity of a classical channel as a special case. 
This, as we might expect, happens when all piS are diag- 
onal in the same basis, i.e. they commute (classically all 
the states and observables commute because they can be 
simultaneously specified and measured which is in con- 
trast with quantum mechanics). Therefore density matri- 
ces are reduced to classical probability distributions. Let 
us call this basis the B representation, with orthonormal 
eigenvectors |6). Then the probability that the measure- 
ment of the symbol represented by pi will yield the value 
b is just {b\pi\b). This 1 call the conditional probability 
Pi{b), that if pi was sent the result b was obtained. Now 
the Holevo bound is 

C = S{p) - J2piS{pi) = S{p) - SBiPi) , 

i 

where Ssipi) is the conditional entropy given by 
SniPi) = ^Pi Y,{b\Pi\b) Hb\pi\b) = Y,PiS{Pi) ■ 

i b i 

Thus, the Holevo bound reduces itself to the Shannon 
mutual information between the commuting messages 
and the measurement in the B representation. 

In general, the usual rule of thumb for obtaining quan- 
tum information theoretic quantities from their classical 
counter-parts is by the convention 



E 



Trace 

PA , 



so that, for example, the Shannon entropy S{p{a)) = 
— ^jP(aj) Inp(ai) now becomes the von Neumann en- 
tropy S{pa) = -TrpAlnpA- 

Example. As the first application of the Holevo bound 
I will compute the channel capacity of a Bosonic field, 
e.g. Electromagnetic field (for an excellent review see 
Caves and Druminond, 1994). The message information 
will now be encoded into modes of frequency u! and aver- 
age photon number fh{u). The signal power is assumed 
be S. The noise in the channel is quantified by the av- 
erage number of excitations n{uj) and is assumed to be 
independent of the signal (i.e. the power of signal and 
noise is additive). We saw that when there is no noise in 
the channel the Holcvo bound is equal to the entropy of 
the average signal. In order to compute the capacity we 
need to maximize this entropy with the constraint that 
the total power (or energy) is fixed. It is well known that 
thermal states arc those that maximize the entropy. We 
thus assume that both the noise and signal+noisc are in 
thermal equilibrium and follow the usual Bose-Einstein 
statistics. The noise power is 



N 



TT{kTf 



P = S + N = 



12% 



where Te is the equilibrium temperature of signal-|-noise. 
Therefore it follows that 

Te = (12?i5/7rfc2+T2)V2 
The state of the noise in the mode w is 



Pn{uj) = 



1 _ g^-huj/kT 
n 

while the state of the output is 



n)(n| 



pjv-Fs(w) = 



1 _ g-hu/kTe 
f^n{oj)hui / kTe 



n) {n\ 



The capacity of the channel is given by the Holevo bound 
which is 



C= / [S{ps+n{uj)) - S{pN{uj))]duj (43) 
TrfcT 



6^1 In 2 



{{12nS/n{kTf + 1)^2 _ 



The integration is there to take into account all the modes 
of the field. Let us look at the two extreme limits of this 
capacity. In the high temperature limit we obtain the 
"classical" capacity 



Cc 



kT\n2 ' 



(44) 



a result derived by Shannon and Weaver (1949). This 
states that in order to communicate one bit of informa- 
tion with this set-up we need exactly kT In 2 amount of 
energy. In the low temperature limit, on the other hand, 
quantum effects become important and the capacity be- 
comes independent of T 



(45) 



The power of the output of the channel (signaH-noise) is 



which was derived by Stern (1960), Gordon (1964), Lebc- 
dev and Levitin (1963) and Yamamoto and Haus (1986) 
among others. Note also the appearance of Planck's con- 
stant which is a key feature of quantum mechanics. If 
we wish to communicate one bit of information in this 
limit we need only ?i/7r(ln2) ^ 10~^^ joules of energy. 
This is significantly less than the corresponding energy 
in the classical limit. Let us now compare the classical 
and quantum capacity limits to the total energy of N 
harmonic oscillators (Bosons) in the same two limits. In 
the high temperature limit the equipartition theorem is 
applicable and the total energy is 3NkT (i.e. it depends 
on temperature). In the low temperature limit all the 
Harmonic oscillators settle down to the ground state so 
that the total energy becomes Nhuj/2 (i.e. it is indepen- 
dent of temperature and we see the quantum dependence 
through Plank's constant K). 
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B. Schumacher's compression 

The most optimal communication through a noiseless 

channel using pure states is equivalent to data compres- 
sion. We have seen in eq. (3) that the limit to the classi- 
cal data compression is just given by the entropy of the 
probability distribution of the data. Wc would thus guess 
that the limit to quantum data compression is given by 
the von Neumann entropy of the set of states being com- 
pressed. This, in fact, turns out to be a correct guess 
as first proven by Schumacher (1995). So, Alice now en- 
codes letters of her classical message into pure quantum 
states and sends these to Bob. For example if a \tjja) 
and b — > \tpb), then Alice's message aab will be sent to 
Bob as the sequence of pme quantmn states \ijja) \ijja) |V-'f>) • 
The exact problem can be phrased in the following 
equivalent fashion: suppose a quantum source randomly 
prepares different qubit states \tjji) with the correspond- 
ing probabilities pi. A random sequence of n such states 
is produced. By how much can this be compressed, i.e. 
how many qubits do we really need to encode the original 
sequence (in the limit of large n)7 First of all the total 
density matrix is 

p = ^Pi\i'i}{i'i\ 

i 

Now, this matrix can be diagonalised 

i 

where Vi and \ri) are the eigenvectors and eigenvalues. 
This decomposition is, of course, indistinguishable from 
the original one (or any other decomposition for that 
matter). Thus we can think about compression in this 
new basis, which is easier as it behaves completely clas- 
sically (since {ri\ri) = Sij). We can therefore invoke re- 
sults from the previous section on classical typical se- 
quences and conclude that the limit to compression is 
n{— Inrj), i.e. n qubits can be encoded into nS{p) 
qubits. No matter how the states are generated, as long 
as the total state is described by the same density ma- 
trix p its compression limit is its von Neumann entropy. 
This protocol and result will be very important when we 
discuss entanglement measures in the following section. 



|0) 




|1) 

FIG. 4. This figure shows the two non-orthogonal states on 
the Bloch sphere which are used to encode a message. The 
overlap between them is sin^ and the smaller the overlap, 
the more the total message can be compressed. In terms 
of information, the less distinguishable the states (i.e. the 
smaller the overlap), the less information they carry. 

Example. Suppose that Alice encodes her bit into states 
|*o) = cos(6'/2)|0)-Fsin(6'/2)|l) and = sin(6'/2)|0) 
cos(6'/2)|l) withpo = Pi = 1/2 (sec Fig. 4). Classically it 
is not possible to compress a source that generates and 
1 with equal probability. Quantum mechanically, how- 
ever, compression can be achieved not only by the nature 
of the probability distribution but also due to the non- 
orthogonality of the states encoding symbols of the mes- 
sage. In our example the overlap between the two states 
is (ilJoli^i) = sin6 and they are orthogonal only when 
6 ~ IT in which case no compression is possible. Other- 
wise, the compression ratio is directly proportional to the 
overlap between the states. Suppose Alice's messages are 
only 3 qubits long. Then there are 8 different possibili- 
ties, 1*0*0*0), •••|*i*i*i), which are aU equally likely 
with 1/8 probability. In general these states will lie with 
a high probability within a subspace of the 8 dimensional 
Hilbert space. Let us call this likely subspace a "typical" 
subspace. Its orthogonal complement will be unlikely and 
hence called an " atypical" subspace. In order to find the 
typical and atypical subspaces we need to diagonalise the 
"average" signal 

P = li\^o){M + \^i){^i\) 

Its diagonal form is 

p = 1(1 + sin^)|+)(+| + 1(1 - sme)\-){-\ 

where |±) = |0)±|1). Now we look at the probabilities for 

each of the 8 messages to lie along the new orthogonal 

basis I -|- -|— 1-), ...| ) of the Hilbert space of three 

qubits: 
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|(+ + = (cos(0/2) + sm{e/2)f 

|(+ + -1^®')!' = (cos(0/2) + sm{9/2)f 

+ (cos(6'/2) - sin(6'/2))2 
K+ - = (cos(^/2) +sin(e/2))2 

+ (cos(6'/2) - sin(6i/2))^ 
|( 1^®3^|2 ^ (cos(0/2) - sin(e/2))<5 

where [t/;®"^) represents any 3 qubit sequence of \ipo) and 
iV'i)- In addition all the probabilities for | + H — ), | + 
— h), I — are equal and so are the probabilities for 

I H ),| — — h), I \ — ). Thus the above equation 

contains 64 probabilities in total. Suppose now that 
cos(^/2) ~ sin(^/2). Then, we see that the states con- 
taining two or more + become much more likely. This 
means that the message states are much more likely to 
be in this particular subspace. Therefore the compres- 
sion would be as follows. First the source generates thre(^ 
qubits in some state. Then we project this message onto 
the typical subspace. If we are successful, then this will 
lie in that four dimensional typical subspace for which 
we need only two qubits rather than three. Otherwise, 
our projection will fail and the message will end up in 
the atypical subspace in which case Alice does not com- 
press it. The probability to end up in the atypical space 
asymptotically goes to zero (the law of large numbers). 
Therefore in this example the limit to our compression 
is given by -(1/2(1 + sin 6*)) ln(l/2(l + sin 6*)) - (1/2(1 - 
sin6')) ln(l/2(l — sin^^)) which is of course the von Neu- 
mann entropy of p. The number of dimensions of the 
typical subspace of the total Hilbert space is likewise in 
general equal to e"'^*^''). 

Interestingly, if instead of pure states a quantum source 
generates mixed states pi with probabilities pi, then the 
best compression limit is in general unknown. We can, 
of course, use the above protocol to compress the se- 
quence to the von Neumann entropy of the average sig- 
nal, S{^^piPi). However, in some cases it is known 
that a better compression can be achieved. The lower 
bound to compression is the Holevo bound, S{Y^^piPi) — 
'^iPiS{pi), but it is not known whether this bound can 
in general be attained (see Horodccki, 1998b). 

Next we look at a protocol for classical communication 
that involves entanglement. At first sight this protocol 
seems to violate the Holevo bound on classical communi- 
cation, i.e. that it is possible to communicate only 1 bit 
per single qubit. However, a closer inspection will show 
that this is not the case. 



C. Dense coding 

Now I consider the case of dense coding which was in- 
troduced by Bennett and Wiesner, 1992. In this protocol 
entanglement plays a crucial role and this will give us a 
first indication of the fact that entanglement can be quan- 
tified like any other resource, such as energy for example. 



Alice and Bob initially share an entangled pair of qubits 
in some state Wq, which may be mixed. Alice then per- 
forms local unitary operations on her qubit to put this 
shared pair of qubits into either of the states Wq, W\,W2 
or Wz- In general, Alice may use a completely arbitrary 
set of unitary operations to generate these states: 

Wi = Ui I Wo U| I, (46) 

and the number of generated states is completely arbi- 
trary. In the above equation, Ui acts on Alice's qubit 
and I acts on Bob's qubit. By sending her encoded qubit 
to Bob, Alice is essentially communicating with Bob us- 
ing the states Wo,Wi,W2 and W3 as separate letters. 
The number of bits she can communicate to Bob using 
this procedure is thus bounded by the Holevo bound. 
Moreover, if some block coding is done on a large enough 
collection of qubits in addition to the dense coding, then 
the number of bits of information communicated is equal 
to the Holevo function. We will thus take 

C = S{p)-J2PiS{Pi), (47) 

i 

assuming that any additional necessary block coding will 
automatically be performed to supplement the dense cod- 
ing. This coding is essential in order to achieve the ca- 
pacity given by the Holevo bound, in the asymptotic 
limit (The fact that the bound is achievable follows from 
a complicated argument and cannot really be derived 
using the arguments presented in this review. Haus- 
laden et. al (1996) have proved this for pure states 
and Schumacher and Westmoreland (1997) and indepen- 
dently Holevo (1998) for mixed states). Exactly the same 
assumption has been used in Ref. Hausladen et. al (1996) 
to calculate the capacity for dense coding in the case of 
pure letter states. Eqs.(46) and (47) define the most gen- 
eral version of dense coding and I shall refer to this as 
completely general dense coding (CGCD). 

A simpler example of dense coding is the case when the 
letter states are generated from the initial shared state 
Wo by 

Wo = 1 I Wo I ® I, (48) 
Wi = (71 (g) I Wq (Ti I, (49) 
W2 = a2 I Wo (72 I, (50) 

W3 = (73 O I Wo (73 O I. (51) 

In the above set of equations, the first operator of the 
combination (7i (g) I acts on Alice's qubit and the second 
operator acts on Bob's qubit. I shall refer to this case (i.e 
when the letter states are generated by Eqs.(48)-(51)) 
as simply general dense coding (GDC). The generality 
present in GDC is that Alice is allowed to prepare the 
different letter states with unequal probabilities. 

In the more special case when Alice not only generates 
the four letter states according to Eqs.(48)-(51)) but also 
with equal probability, the ensemble is given by 



18 



W=-J2Wi- (52) 

and the capacity becomes 
1 ^ 

C=-Y,S{Wi\\W). (53) 

I shall call this simplest case special dense coding (SDC). 
Among all the possible ways of doing GDC, SDC is the 
optimal way to communicate when Wq is a pure state 
(Bose et. al 2000a) or a Boll diagonal state. 

Now I derive the most general bound on CGDC 
(Bowen, 2001). Furthermore, this bound can be attained 
by the same protocol as SDC (Bowen, 2001). The proof 
is achieved by first finding an upper bound to the ca- 
pacity for CGDC and then showing that SDC actually 
saturates this bound. Suppose that the initial state of 
Alice and Bob is pab- Then we have: 

C = max5(^pfc(i7'= ® I)pab{{U'')^ ® I)) 

k 

- j2pkS{{u'' ® i)pAB{iu''y ® /)) 

k 

= maxSipAB) - S(pab) 

< S{p'a) + S{p's) - S{pab) 

< 1 + S{pb) - S{pab) (54) 

Since this bound is achievable as shown by Bowen (2001), 
the capacity for CGDC is given by eq. (54). 

I shall now restrict my attention to a calculation of C 
for pure letter states. Consider the initial shared pure 
state Wo to be, 

|V;o) = (a|00)+6|ll)). (55) 

Then, according to Eqs.(48)-(51), the other letter states 
are given by 

|^i) = (a|10)+fe|01)), (56) 
|V2) = -«(a|10)-fe|01)), (57) 
|7^3) = (a|00)-6|ll)), (58) 

from which we obtain Wi = As all Wi are pure 

states we have 

S{Wi) = 0. (59) 

Thus we have 

C = S{W). (60) 

I will consider only the case of SDC as it is optimal. Thus 
the ensemble used is obtained from Eq.(52) to be 

lap I6P 
H.= U_|00)(00| + U_|oi)(01| 

+ M!|io)(lo| + M!|ll)(ll|. 



Thus from Eq.(60) for the capacity C, we get 

C = -{\a\Hog\^ + \h\Hog^-) 

= l-(|a|2log|a|2 + |6|2log|6|2). (61) 

(Note that this agrees with eq. (54) as for pure states 
the total entropy is zero.) Now this implies that a good 
measure of entanglement for a pure state of a system 
composed of two subsystems A and B can be given by 
the von Neumann entropy of the state of either of the 
subsystems. Let \is call this measure the von Neumann 
entropy of entanglement and label it by Ey (Popescu and 
RohrUch, 1997; Bennett et. al, 1996a). Thus 

E^mi^u+B) = s{TiAm{^\A+B)), 

where Tr^ stands for partial trace over states of system 
A. Therefore, for all the states Wi, 

i;,(W) = -(|a|2log|a|2 + |6|2log|6n. 

Thus, 

C = l + K(Wi). 

c 

C = 1 -xlogx- (1-x)log[1-x) 




H 1 \ 

0.5 1 

FIG. 5. This figure shows the dependence of capacity for 
dense coding for pure states a|00) + 6|11) as a function of its 
Schmidt coefficient x = |ap. We see that when the state is 
disentangled, i.e. wlicn cither a = or 6 = 0, the capac- 
ity becomes 1 bit per qubit, i.e. the same as the "classical 
capacity" . 

(see Fig. 5). We can prove that for pure states, SDC (us- 
ing all alphabet states with equal a priori probability) is 
the optimal way to communicate among all possible ways 
of doing GDC (i.e when the letter states arc generated 
by Eqs.((48)-(51)) (Bose et. al, 2000a). Important, how- 
ever, is the fact that the amount of entanglement deter- 
mines exactly how much information Alice can convey to 
Bob. Note that if there is no entanglement shared be- 
tween them, then the amount of information is exactly 
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one bit per Alice's qubit (which is what can be achieved 
classically after all). At the other extreme, when they 
share a maximally entangled state, the amount of infor- 
mation is 2 bits per Alice's qubit. This is the amount that 
no purely classical communication can achieve. However, 
while the von Neumann entropy is a good measure of en- 
tanglement for pure states (in fact, there are arguments 
that it is unique for pure states (Popescu and Rohrlich, 
1997)), it fails when we try to apply it to mixed states. 
A possibility is to follow the logic of the pure state dense 
coding and call S{pb) — S{pab) a measure of entangle- 
ment for mixed states as in eq. (54). This measure has 
been called the "coherent information" and is used to 
describe information transmission through a noisy quan- 
tum channel, as in e.g. Barnum et.al (1998). But, is 
this measure consistent with other natural requirements 
for quantifying entanglement? This question will be ad- 
dressed in the next section. Before this, we show that 
in order to delete a certain amount of correlations we 
need to increase the entropy of the environment by at 
least this amount. This is known as Landauer's erasure 
(Landauer, 1961) and is seen to be linked directly to the 
relative entropy. 

D. Relative entropy, thermodynamics and 
information erasure 

We have seen that communication essentially creates 
correlations between the sender and the receiver. Creat- 
ing correlations is therefore very important in order to 
be able to convey any information. However, I would 
now like to talk about the opposite process - deleting 
correlations. Why would one want to do so? The rea- 
son is that we might want to correlate one system to 
another and may need to delete all its previous corre- 
lations to be able to store new ones. I would like to 
give a more physical statement of information erasure 
and link it to the notion of measurement. I will there- 
fore introduce two correlated parties - a system and an 
apparatus. The apparatus will interact with the system 
thereby gaining a certain amount of information about 
it (the full quantum description of this process will be 
presented in section V). Suppose now that the appa- 
ratus needs to measure another system. We first need 
to delete information about the last system before we 
can make another measurement. The most general way 
of conducting erasure (resetting) of the apparatus is by 
employing a reservoir in thermal equilibrium at certain 
temperature T. To erase the state of the apparatus we 
just throw it into the reservoir and introduce a new pure 
state. The entropy increase of the operation now con- 
sists of two parts. Firstly, the state of the apparatus 
evolves to the state of the reservoir and this entropy is 
now added to the reservoir entropy. Secondly, the rest of 
the reservoir changes its entropy due to this interaction 
which is the difference in the apparatus internal energy 
before and after the resetting (no work is done in this 



process) . This quantum approach to equilibrium was also 
studied by Partovi (1989). A good model is obtained by 
imagining that the reservoir consists of a great number of 
systems (of the same "size" as the apparatus) all in the 
same quantum equilibrium state u). Then the apparatus, 
which is in some state p, interacts with these reservoir 
systems one at a time. Each time there is an interaction, 
the state of the apparatus approaches more closely the 
state of the reservoir, while that single reservoir system 
also changes its state away from the equilibrium. How- 
ever, the systems in the bath are numerous so that after 
a certain number of collisions the apparatus state will 
approach the state of the reservoir, while the reservoir 
will not change much since it is very large (this is equiv- 
alent to the so called Born-Markov approximation that 
leads to irreversible dynamics of the apparatus described 
here). 

Bearing all this in mind, we now reset the apparatus 
by plunging it into a reservoir in a thermal equilibrium 
(Gibbs state) at temperature T. Let the state of the reser- 
voir be 

j 

where H = |£j) is the Hamiltonian of the reser- 

voir, Z = Tr(e~^^) is the partition function and /3~^ = 
fcT, where k is the Boltzmann constant. Now suppose 
that due to the measurement the entropy of the appara- 
tus is S{p) (and an amount S{p) of information has been 
gained), where p = |rj) (rj| is the eigen expansion 
of the apparatus state. Now the total entropy increase 
in the erasure is (there are two parts as I argued above: 
1. change in the entropy of the apparatus and 2. change 
in the entropy of the reservoir) 

We immediately know that ASapp = S{ui), since the state 
of apparatus (no matter what state it was before) is now 
erased to be the same as that of the reservoir. On the 
other hand, the entropy change in the reservoir is the 
average over all states jr^) of heat received by the reser- 
voir divided by the temperature. This is minus the heat 
received by the apparatus divided by the temperature; 
the heat received by the apparatus is the internal energy 
after the resetting minus the initial internal energy (r^ | H 
In) . Thus, 

_ v^, TricjH)-{rk\H\n) 

k 

= Yl^'^'' X] I f - g'fe)(-ioggfe - log^) 

k j 

= —Tr{p — w)(logw — logZ) = Tr{LO — p) logo; 

Altogether we have an exact expression on the amount 
of entropy increase due to deletion: 

Entropy increase due to Landauer's eraisure. 
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A^e^ = -Tr(p log w) 

This result (Vedral, 2000) generalizes Lubkin's result 

which applies only when [p,Lv] =0. In general, how- 
ever, the information gain is equal to S{p), the entropy 
increase in the apparatus. This entropy increase is a 
maximum, the information between the system and ap- 
paratus is usually smaller as in eq. (42). Thus, we see 
that 

ASer = —Tr{plogLu) > S{p) = I 

and Landauer's principle is confirmed (the inequality fol- 
lows from the fact that the quantum relative entropy 
^(plla;) = — Tr{p log Lo) — S{p) is no n- negative) . So the 
erasure is the least wasteful when u = p, in which case 
the entropy of erasure is equal to S{p), the information 
gain. This is when the reservoir is in the same state as 
the state of the apparatus we are trying to erase. In this 
case we just have a state swap between the new pure state 
of the apparatus which is used to replace our old state p. 
Curiously enough, creating correlations is not costly in 
terms of the entropy of environment (such as when Alice 
and Bob communicate). 

Landauer's erasure is a statement which is equivalent 
to the Second Law of Thermodynamics. If we could 
delete information without increasing entropy, then we 
could construct a machine that completely converts heat 
into work with no other effect which contradicts the Sec- 
ond law. The opposite is also true. Namely if we could 
convert heat into work with no other effect than we could 
use this energy to delete information with no entropy in- 
crease (Penrose, 1973; Landauer, 1961). Thus, the rela- 
tive entropy provides an interesting link between thermo- 
dynamics, information theory and quantum mechanics 
(also see Brillouin's excellent book (Brillouin, 1956)). 
Landauer's erasure and data compression. I will 
now show how Landauer's principle can be used to de- 
rive the limit to quantum data compression. The free 
energy lost in deleting information stored in a string of n 
qubits all is the state p is n(3^^S{p). However, we could 
first compress this string and then delete the resulting 
information. The free energy loss after the compression 
is m(3~^ log 2 = mf3~^, where the string has been com- 
pressed to m qubits. The two free energies before and 
after compression should be equal if no information is 
lost during compression, i.e. if we wish to have maximal 
efficiency, and therefore m/n = S{p) as shown previously 
(c.f. (Feynman, 1996)). The equality is, of course, only 
achieved asymptotically. 

So far we have seen that the entropy plays a pivotal 
role in communication theory and data compression as a 
limit to both communication capacity and compression. 
It also quantifies the amount of entanglement in a pure 
bi-partite state. Finally, it plays thermodynamical role 
characterizing the mixedness in a certain quantum state. 
This last role was introduced first by von Neumann. Now 
we go beyond the classical use of quantum states towards 



looking at how we can achieve quantum communication 
of quantum states. 

IV. QUANTUM COMMUNICATION: QUANTUM 

USE 

In this section the problem of entanglement quantifi- 
cation is analysed. Previously we have seen that the re- 
duced von Neumann entropy is a good measure of entan- 
glement for two subsystems in a joint pure state (see also 
Bennett et. al (1996a)). This is a consequence of the 
Schmidt decomposition procedure introduced earlier and 
was exemplified in the dense coding. However, for the 
mixed states of two subsystems, or for more than two sub- 
systems this procedure does not exist in general. There- 
fore it is not immediately clear how to understand and 
quantify correlations for these states. Initially, we might 
think that Bell's inequalities (Bell, 1987; Clauser et. al, 
1969; Redhead, 1987) would provide a good criterion 
for separating quantum correlations (entanglement) from 
classical correlations in a given quantum state. States 
that violate Bell's inequalities would be entangled and 
other states would be disentangled. However, while it is 
true that a violation of Bell's inequalities is a signature 
of quantum correlations, not all entangled states violate 
Bell's inequalities (Gisin; 1996). So, in order to com- 
pletely separate quantum from classical correlations we 
need a different criterion. 

I will present here an approach that has proven to be 
very fruitful in understanding entanglement in general. 
It begins by presenting a set of conditions that any rea- 
sonable measure of entanglement has to satisfy. Then, I 
discuss possible candidates based on this criterion. 

A. Quantifying entanglement 

In this section we will mainly focus on understand- 
ing entanglement of bi-partite systems, i.e. systems con- 
taining two subsystems only. The term entanglement, or 
versrankung as it was originally called, was introduced by 
Schrodinger (1935) to emphasise bizarre implications of 
quantum mechanics. The reason for studying bi-partite 
entanglement is that it is the simplest and most basic 
kind of entanglement and is well understood at present. 
Also, starting from bi-partite entanglement we will build 
up a theory that can be generalized to any number of 
systems. So, unless stated otherwise, the presentation in 
this subsection is confined to bi-partite systems only. 

To determine the basic properties every "good" entan- 
glement measure should satisfy (Vedral et. al, 1997a; 
Vedral and Plenio, 1998), we have to discuss the is- 
sues of what we actually mean when we say that some- 
thing is " disentangled" . By definition a bi-partite state 
is disentangled if it can be written in the separable form 
pAB = YliiPiPf ® Pf (Werner, 1989). It is clear why 
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we choose to define disentangled states in this manner: 
these are the most general states which can be can be 
created by Alice and Bob by local operations and clas- 
sical communication (LOCC). Thus these states contain 
no entanglement, as entanglement can only be created 
through global operations. All other states will bo en- 
tangled to some degree. In addition note that the set of 
all disentangled states is convex: a convex combination 
(mixture) of any two disentangled states is itself disen- 
tangled. This fact will be important when we quantify 
entanglement later. 

So the first question to answer is the following: " when 
can a given matrix be written in a separable form?". The 
necessary and sufficient condition is known in general in 
terms of positive (but not necessarily completely posi- 
tive) maps (Peres, 1996; Horodecki et. al, 1996). Sup- 
pose that A is any positive map; then, 

Ia ® AeiJ^Pipf ® Pf) = J2p^P^ ® ^B{pf) (62) 

i i 

is always a positive operator. Remarkably, the con- 
verse is also true. If, for all positive maps A, the state 
I A ^b{pab) is positive, then pab is separable (dis- 
entangled). Therefore, if wc want to know whether a 
given state pab is entangled, we need to find a positive 
map whose action on B will result in a negative opera- 
tor and hence not a physical state (Horodecki, 2000a). 
This condition is still not operational since there is an 
infinite number of positive maps to search. In fact, there 
is no operational condition in general, but it only exists 
in some special cases. For example, for two qubits and 
a qubit and a qutrit (three level system), this condition 
simplifies to the following (Peres, 1996; Horodecki et. al, 
1996): such a state is entangled iff a transposition of B 
results in a negative operator, i.e. Pab- The relation- 
ship between positive maps and entanglement is a very 
active field of research and I refer the interested reader 
to some papers investigating this issue: Bennett et. al 
(1999b), Kraus et. al (2000), DiVicenzo et. al (2000) and 
Lewenstein et. al (2000). With this in mind, I turn to 
quantifying entanglement. 

The first property wc need from an entanglement mea- 
sure is that a disentangled state does not have any quan- 
tum correlations. This gives rise to our first condition: 

El) For any separable state cr the measure of entangle- 
ment should be zero, i.e. 

E{a) = . (63) 

Note that we do not ask the converse, i.e. that if E{a) = 
0, then (T is separable. The reason for this will become 
clear below. 

The next condition concerns the behavior of the entan- 
glement binder simple local unitary transformations. A 
local unitary transformation simply represents a change 
of the basis in which we consider the given entangled 



state. But a change of basis should not change the 
amount of entanglement that is accessible to us, because 
at any time we could just reverse the basis change (since 
unitary transformations are fully reversible). 

E2) For any state a and any local unitary transforma- 
tion, i.e. a unitary transformation of the form 
Ua ® Ub, the entanglement remains unchanged. 
Therefore 

E{a) = E{UA^UB(TUl^Ul) . (64) 

The third condition is the one that really restricts the 
class of possible entanglement measures. Unfortunately 
it is usually also the property that is the most difficult to 
prove for potential measures of entanglement. Wc have 
already proved that no good measure of correlations be- 
tween two subsystems should increase under local op- 
erations on the subsystems separately. However, quan- 
tum entanglement is even more restrictive in that the to- 
tal amount of entanglement cannot increase locally even 
with the aid of classical communication. Classical corre- 
lations, on the other hand, can be increased by LOCC. 
Example. Suppose that Alice and Bob share n uncor- 
related pairs of qubits, for example all in the state |0). 
Alice's computer then interacts with each of her qubits 
such that it randomly flips each qubit with probability 
1/2. However, whenever a qubit is flipped, Alice's com- 
puter (classically) calls Bob's computer and informs it to 
do likewise. After this action on all the qubits, Alice and 
Bob end up sharing n (maxiamlly) correlated qubits in 
the state |00)(00| + |11)(11|, i.e. whenever Alice's qubit 
is zero so is Bob's and whenever Alice's qubit is one so is 
Bob's. The state of each pair is mixed because Alice and 
Bob do not know whether their computers flipped their 
respective qubits or not. 

We can always calculate the total amount of entan- 
glement by summing up the entanglement of all systems 
after we have applied our local operations and classical 
communications . 

E3) Local operations, classical communication and sub- 
selection cannot increase the expected entangle- 
ment, i.e. if we start with an ensemble in state 
a and end up with probability Pi in subensembles 
in state ai then we will have 

E{a)>Y,PiE{<Ti) . (65) 



where CTj = Ai(»Bi(7Al®Bj /pi andp^ = Tr(A^(8)E^(TAj(g) 
bJ). The form A^B shows that Alice and Bob perform 
their operation locally (i.e. Alice cannot affect Bob's sys- 
tem and vice versa). However, Alice's and Bob's oper- 
ations can be correlated which is manifested in the fact 
that they have the same index. It should be pointed out 
that although all the LOCC can be cast in the above 
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product form, the opposite is not true: not all the oper- 
ations of the product form can be executed locally (Ben- 
nett et. al, 1999a). This means that the above condition 
is more restrictive than necessary, but this does not have 
any significant consequences as far as I am aware. An 
example of E3 operation is local addition of particles on 
Alice's and Bob's side. Note also that E2 operations are 
a subset (special case) of E3 operations. 

The last condition is there to make sure that our mea- 
sure is consistent with pure states. 

E4) Entanglement of a pure state is equal to the reduced 
von Neumann entropy. 

The above conditions are natural and easy to under- 
stand physically. However, they can be reduced to sim- 
pler and more elementary conditions, which I now briefly 
discuss. Suppose that we ask that the measure of entan- 
glement is: 

1. Weakly additive, i.e. E{p® p) = 2E{p); 

2. Continuous, i.e. if p is close to a, then E{p) is close 
to E{a). 

Then, it can be shown (Popescu and Rohrlich, 1997; Vi- 
dal, 2000) that E4 is a consequence of the weak additivity 
and continuity (providing we assume that the entangle- 
ment of a maximally entangled state is normalised to 
log 2). Also, in E3 we use the most general local POVMs, 
but we know that these can be implemented by adding 
ancillas locally, performing a unitary transformation on 
the system and ancilla locally and then tracing out the 
ancillas. So, E2 and E3 can be presented in a more ele- 
mentary way as was done by Vidal (2000). Thus, E2-E4 
can be written in terms of more fundamental processes 
and properties. However, I chose to introduce entangle- 
ment measures via E1-E4 as I think that they are more 
intuitive and capture the main ideas. Readers interested 
in further analysis of the conditions are advised to read: 
Vidal (2000) and Horodecki et al. (2000b). 

Before I introduce different entanglement measures I 
would like to discuss the following question: "What do 
we mean by saying that a state a can be converted into 
another state p by LOCC?" . Strictly speaking, we mean 
that there exists an LOCC operation that, given a suffi- 
ciently large number of copies, n, of a, will convert them 
arbitrarily close to m copies of the state p, i.e., 

(Ve > 0)(Vm e N){3n G TV; 3$ e LOCC) 

||$(ct®")-p®"*|| <e (66) 

where | |cr— p| | is some measure of distance (metric) on the 
set of density matrices. Now, if a is more entangled than 
p, we expect that there is an LOCC such that m > n; 
otherwise, we expect that we can have n < m. Measur- 
ing entanglement now reduces to finding an appropriate 
function on the set of states to order them according to 
their local convertibility. This is usually achieved by let- 
ting either a or p be a maximally entangled state. 



I now introduce three different measures of entangle- 
ment, all of which obey E1-E4. First I discuss the en- 
tanglement of formation (Bennett et al., 1996b). Bennett 
et al. (1996b) define the entanglement of formation of a 
state p by 

Entcinglement of formation. 



Epip) := min^ piS{p'^ 



) 



(67) 



where S{pa) = — Trp^lnpA is the von Neumann en- 
tropy and the minimum is taken over all the possi- 
ble realizations of the state, pab — J2j Pj\^j){'^j\ with 
Pa = TrB(|^i)(^i|). This measure satisfies E1-E4. The 
basis of formation is that Alice and Bob would like to 
create an ensemble of n copies of the non-maximally en- 
tangled state, pAB, using only local operations, classical 
communication, and a number, m, of maximally entan- 
gled pairs (see Fig. 6). Entanglement of formation is the 
asymptotic conversion ratio, ^, in the limit of infinitely 
many copies. The form of this measure given in eq. (67) 
will be more transparent after the next subsection. Fur- 
thermore, I will analyse the relationship between the en- 
tanglement of formation and other measures proposed in 
more detail later. It is worth mentioning that a closed 
form for this measure exists for two qubits (Wootters, 
1998). 

Related to this measure is the entanglement of distil- 
lation, also introduced by Bennett et al. (1996b). 
Entanglement of distillation. This measure defines 

the amount of entanglement of a state a as the asymp- 
totic proportion of singlets that can be distilled using a 
purification procedure (for a rigorous definition see Rains 
(1999)). This is the opposite process to that leading to 
the entanglement of formation (Fig. 6), although its 
value is generally smaller. This implies that formation 
of states is in some sense irreversible. The reason for this 
irreversibility will be explained in the next sub-section. 
This measure fails to satisfy the converse of El. Namely, 
for all disentangled state the entanglement of distilla- 
tion is zero, but the converse is not true. There exist 
states which are entangled, but no entanglement can be 
distilled from them and, for this reason, they are called 
bound entangled (Horodecki et al., 1998a) (see also DiVi- 
cenzo (2000)). This is the reason why the condition El is 
not stated as both the necessary and sufficient condition. 



.Ill 



DISTILLATION 



Maximally 
Entangled Pairs 



0-- 


--0 


#-- 


--m 


o- 


-o 




-<:> 






Nnn Maximally 


Eniang 


led Pairs 



23 



FIG. 6. This figure illustrate formation of entangled states: 
a certain number of maximally entangled pairs is manipu- 
lated by LOCC and converted into pairs in some state p. The 
asymptotic conversion ration is known as the entanglement 
of formation. The converse of formation is distillation of en- 
tanglement. Again, the asymptotic rate of converting pairs 
in state p into maximally entangled states is known as the 
entanglement of distillations. The two measures of entangle- 
ment are in general different, distillation being greater than 
or equal to formation. This surprising irreversibility of entan- 
glement conversion is explained in the text as a consequence 
of loss of classical information about the decomposition of p. 



I now introduce the final measure of entanglement 

which was first proposed in Vcdral et al. (1997a). This 
measure is intimately related to the entanglement of dis- 
tillation by providing an upper bound for it. If P is the 
set of all disentangled states, the measure of entangle- 
ment for a state a is then defined as 

Relative entropy of entanglement. 



E{a) ■.= rmn S{a\\p) 



(68) 



symbol means that we have m copies of the state 
p. Nevertheless, it is interesting to point out that any 
convex measure that satisfies continuity and weak addi- 
tivity has to be bounded from below by the entangle- 
ment of distillation and from above by the entanglement 
of formation (Horodecki et al., 2000b). We will see that 
"most" entanglement measures can in fact be generated 
using the quantum relative entropy. 

It is interesting to note that the relative entropy of en- 
tanglement does in fact satisfy both convexity and con- 
tinuity (Donald and Horodecki, 1999) although not ad- 
ditivity (VoUbrecht and Werner 2000). Furthermore, we 
can easily show that it is an upper bound to the entangle- 
ment of distillation. We have that for any pure state 
min<^g2,5(V'®"||w) = min^ev - log But, 

the logarithmic function is concave so that 



min-(V'®"|logw|V'^ 



^) > min - 



log(V'^"|w|V'*^") 



However, according to the recent result of the Horodeckis 
(Horodecki et al, 1996), since w is a disentangled state, 
then its fidelity with the maximally entangled state can- 
not be larger than the inverse of the half dimension of 
that state, so that < 1/2". Thus, 



where S{a\ \p) is the quantum relative entropy. This mea- 
sure, which I will call the relative entropy of entangle- 
ment, tells us that the amount of entanglement in a is 
its distance from the disentangled set of states. In statis- 
tical terms introduced in Section II, the more entangled 
a state is the more it is distingiiishable from a disentan- 
gled state (Vedral et al., 1997b). To understand better 
all three measures of entanglement we need to introduce 
another quantum protocol that relies fundamentally on 
entanglement. 

Another condition which might be considered intuitive 
for a measure of entanglement is convexity. Namely, we 
might require that 

i^(^ftcT') < Y^PiEia^) 

i i 

This states that mixing cannot increase entanglement. 
For example, an equal mixture of two maximally entan- 
gled states 1 00) 4- 111) and |00) — |11) is a separable state 
and consequently contains no entanglement. I did not 
include convexity as a separate requirement for an en- 
tanglement measure as it is not completely independent 
from E3. This is because E3 and the strong additivity 
{E{p (g) cr) = E{p) + E{a)) imply convexity. 



|a;)>-log(l/2") 



(69) 



But we know that this minimum is achievable by the state 

LU ^ (0®", where p is obtained from -0 by removing the off- 
diagonal elements in the Schmidt basis. Consequently, if 
we are starting with n copies of state a, and obtaining m 
copies of ip by LOCC, then 



D = — = — min Sltp^ 
n n Luev 



\uj) < - min 5(0-®" I Iw) 



where the equality follows from eq.(69) and the inequality 
from the fact that the relative entropy is non-increasing 
under LOCC (strictly speaking, D — lim„_,oo ^ and, of 
course, m is a function of n, m = m(n)). Thus, the 
distillable entanglement is bounded from above by the 
relative entropy of entanglement. 

A similar argument can be given to show that the rela- 
tive entropy of entanglement is bounded from the above 
by the entanglement of formation (Vedral and Plenio, 
1998). Since most of the measures of entanglement can 
be derived from the relative entropy they will possess 
this similar property. In order to see this, we first need 
to introduce quantum teleportation. 



nY,piE{Pi) = E{pf 



Pi 



B. Teleportation 



>E{{Y,PiPiT'") = nE{{Y,PiPi)^ 



where the equalities follow from the strong additivity as- 
sumption and the inequality is a consequence of E3. The 



Let us begin by describing quantum teleportation in 
the form originally proposed by Bennett et al. (1993). 
Suppose that Alice and Bob, who are distant from each 
other, wish to implement a teleportation procedure. Ini- 
tially they need to share a maximally entangled pair of 
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qubits. This means that if Ahcc and Bob both have one 
qubit each, then the joint state may for example be: 

|*ab) = (|0a)|0b) + \Ia)\Ib))/V2 , (70) 

where the first ket (with subscript A) belongs to Alice 
and second (with subscript B) to Bob. Note that this 
state is maximally entangled and is different from a sta- 
tistical mixture (|00)(00| + |ll)(ll|)/2 which is the most 
correlated state allowed by classical physics. 

Now suppose that Alice receives a qubit in an unknown 
state 1$) = a|0)+6|l) and she wants to teleport it to Bob. 
The state has to be unknown to her because otherwise 
she can just phone Bob up and tell him all the details 
of the state, and he can then recreate it on a particle 
that he possesses. Given that Alice does not know the 
state, she cannot measure it to obtain all the necessary 
information to specify it. If she could, this would lead 
to a violation of uncertainty principle. Therefore she has 
to resort to using the state I^'ab) that she shares with 
Bob to transfer her state to him without actually learning 
this state. This procedure is what we mean by quantum 
teleportation. 

I first write out the total state of all three qubits 

|$ab) := mi'fAB) = {a.\0) + 6|1))(|00) + |11))/V2 . 

However, the above state can be conveniently written in 
a different basis 

\^ab) = (a|000) + a|011) + fe|100) + 6|111))/V2 
= l[|<f>+)(a|0)+ fell)) + |$-)(a|0)- fell)) 
+ |*+)(a|l)+fe|0)) + |*-)(a|l)-6|0))], 

where 

|$+) = (|00) + |ll))/x/2 (71) 

|$-) = (|00)-|ll))/v^ (72) 
|vI/+) = (|01) + |10))/V2 (73) 
|vl/-) = (|01)-|10))/y2 (74) 

form an ortho-normal basis of Alice's two qubits (remem- 
ber that the first two qubits belong to Alice and the last 
qubit belongs to Bob). The above basis is frequently 
called the Bell basis. This is a very useful way of writ- 
ing the state of Alice's two qubits and Bob's single qubit 
because it displays a high degree of correlations between 
Alice's and Bob's parts: for every state of Alice's two 
qubits (i.e. I^"*"), l^*"), j^"*"), |^~)) there is a correspond- 
ing state of Bob's qubit. In addition the state of Bob's 
qubit in all four cases "looks very much like" the orig- 
inal qubit that Alice has to teleport to Bob. It is now 
straightforward to see how to proceed with the telepor- 
tation protocol (Bennett et al., 1993): 

1. Upon receiving the unknown qubit in state |$) Al- 
ice performs projective measurements on her two 



qubits in the Bell basis. This means that she will 
obtain one of the four Bell states randomly, and 
with equal probability. 

2. Suppose Alice obtains the state j^'"'"). Then the 
state of all three qubits (Alice -|- Bob) collapses to 
the following state 

|*+)(a|l)+fe|0)). 

(the last qubit belongs to Bob as usual). Alice now 
has to communicate the result of her measurement 
to Bob (over the phone, for example). The point of 
this communication is to inform Bob how the state 
of his qubit now differs from the state of the qubit 
Alice was holding before the Bell measurement. 

3. Now Bob has to apply a unitary transformation on 
his qubit which simulates a logical NOT operation: 
|0) |1) and |1) |0). He thereby transforms the 
state of his qubit into the state a|0) +6|1), which is 
precisely the state that Alice had to teleport to him 
initially. This completes the protocol. It is easy to 
see that if Alice obtained some other Bell state, 
then Bob would have to apply some other simple 
operation to complete the teleportation. They can 
be represented by the Pauli spin matrices. 

An important fact to observe in the above protocol is 

that all the operations (Alice's measurements and Bob's 
unitary transformations) are local in nature. This means 
that there is never any need to perform a (global) trans- 
formation or measurement on all three qubits simulta- 
neously, which is what allows us to call the above pro- 
tocol a genuine teleportation. It is also important that 
the operations that Bob performs are independent of the 
state that Alice tries to teleport to him. Note also that 
the classical communication from Alice to Bob in step 2 
above is crucial because otherwise the protocol would be 
impossible to execute (there is a deeper reason for this: 
if wo could perform teleportation without classical com- 
munication then Alice could send messages to Bob faster 
than the speed of light, see e.g. Vedral ct al. (1997c)). 

It is important to observe that the fact that the ini- 
tial state to be teleported is destroyed immediately after 
Alice's measurement, i.e it becomes maximally mixed of 
the form (|0)(0| -|- |l)(l|)/2. This has to happen since 
otherwise Alice and Bob would end up with two qubits 
in the same state. So, effectively, they would clone an 
unknown quantum state, which is impossible by the laws 
of quantum mechanics. This is the no-cloning theorem 
of Wootters and Zurek (1982), which is a simple conse- 
quence of linearity of quantum dynamical laws. We also 
see that at the end of the protocol the cjuantum entangle- 
ment of I^I/ab) is completely destroyed. Does this have 
to be the case in general or might we save that state at 
the end (by perhaps performing a different teleportation 
protocol)? The answer is yes (Plenio and Vedral, 1998), 
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and the reason is that if this was not the case, then entan- 
glement could increase under LOCC, which as we have 
seen is prohibited by definition. 

Teleportation has been experimentally performed in 
three different set-ups (Bouwmeester et al., 1997). It 
will now be used to link the three measures of entangle- 
ment. I will show that all different measures of entangle- 
ment can be understood as special cases of the relative 
entropy of entanglement (Henderson and Vedral, 2000). 
This unification relies on adding an ancilla, which I will 
call a memory system and which will help us keep track 
of the various decompositions of a given bi-partite den- 
sity matrix. How much access is available to this memory 
determines which measure of entanglement is used. 



C. Measures of entanglement from relative entropy 

Suppose that Alice and Bob share a state described 
by the density matrix pab- The state pab has 
an infinite number of different decompositions e = 
(V'Asl'-Pi}: ii^to pure states \'4'\b)^ ^lih. proba- 
bilities Pi. We denote the mixed state pAB written in 
decomposition e by 



PAB = ^PiWAB){'^. 



AB\ 



(75) 



As we have seen measures of entanglement are associ- 
ated with formation and distillation of pure and mixed 

entangled states. The known relationships between the 
different measures of entanglement for mixed states are 
Ed (pab) < Ere{pab) < Ef{pab), (Vedral and Ple- 
nio, 1998). Equality holds for pure states, where all the 
measures reduce to the Von Neumann entropy, S{pa) = 
S{pb). 
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FIG. 7. This figure illustrates formation of a state by 
LOCC and with the help of teleportation. First, Alice cre- 
ates the joint state of subsystems A and B locally. Then, she 
performs the quantum data compression on the subsystem B 
and teleports the compressed state to Bob. Finally, Bob de- 
compresses the received state and hence Alice and Bob end 
up sharing the joint state of A and B initially prepared by 
Alice. 



Formation of an ensemble of n non-max;imally entan- 
gled pure states, pab = \'4'ab) (V'asI is achieved by the 



following protocol. Alice first prepares the states she 
would like to share with Bob locally. She then uses 
Schumacher compression, (Schumacher, 1995), to com- 
press subsystem B into nS^ps) states. Subsystem B is 
then teleported to Bob using nS{pB) maximally entan- 
gled pairs. Bob decompresses the states he receives and 
so ends up sharing n copies of pab with Alice. The en- 
tanglement of formation is therefore Ef{pab) = S{pb)- 
For pure states, this process requires no classical commu- 
nication in the asymptotic limit (Lo and Popescu, 1999). 
The reverse process of distillation is accomplished using 
the Schmidt projection method (Bennett et al, 1996a), 
which allows nS^ps) maximally entangled pairs to be 
distilled in the limit as n becomes very large. No clas- 
sical communication between the separated parties is re- 
quired. Therefore pure states are fully inter-convertible 
in the asymptotic limit. 

The situation for mixed states is more complex. When 
any mixed state, denoted by Eq.(75), is created, it may 
be imagined to be part of an extended system whose 
state is pure. The pure states IV'ab) t;he mixture may 
be regarded as correlated to orthogonal states |mj) of a 
memory M. The extended system is in the pure state 

IiPmab) = Z)i v^I'^«)I^ab)- ^^"^^ ^'^ access to 

the memory system, wc trace over it to obtain the mixed 
state in Eq.(75). In fact, the lack of access to the mem- 
ory is of a completely general nature. It may be due to 
interaction with another inaccessible system, or it may 
be due to an intrinsic loss of information. The results I 
will present are universally valid and do not depend on 
the nature of the information loss. Wc will see that the 
amount of entanglement involved in the different entan- 
glement manipulations of mixed states depends on the 
accessibility of the information in the memory at differ- 
ent stages. Note that a unitary operation on \^mab) 
will convert it into another pure state \(Pmab) with the 
same entanglement, and tracing over the memory yields 
a different decomposition of the mixed state. Reduction 
of the pure state to the mixed state may be regarded as 
due to a projection-valued measurement on the memory 
with operators {Ei = jm^) (mi|}. 

Consider first the protocol of formation by means of 
which Alice and Bob come to share an ensemble of n 
mixed states pab as in Fig. 7. Alice first creates the 
mixed states locally by preparing a collection of n states 
in a particular decomposition, e = (^Asl'Pi} by 

making npi copies of each pure state IV'ab)- -^t same 
time wc may imagine a memory system entangled to the 
pure states to be generated, which keeps track of the iden- 
tity of each member of the ensemble. I consider first the 
case where the state of subsystems A and B together with 
the memory is pure. Later, I will consider the situation in 
which Alice's memory is decohered. There are then three 
ways for her to share these states with Bob. First of all, 
she may simply compress subsystem B to nS{pB) states, 
and teleport these to Bob using nSips) maximally en- 
tangled pairs. The choice of which subsystem to teleport 
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is made so as to minimise the amount of entanglement 
required, so that S{pb) < S{pa)- The teleportation in 
this case would require no classical communication in the 
asymptotic limit, just as for pure states (Lo and Popescu, 
1999). The state of the whole system which is created by 
this process is an ensemble of pure states \'tpMAB), where 
subsystems M and A are on Alice's side and subsystem 
B is on Bob's side. In terms of entanglement resources, 
however, this process is not the most efficient way for 
Alice to send the states to Bob. She may do it more efR- 
ciently by using the memory system of IV'mab) to identify 
blocks of npi members in each pure state and ap- 

plying compression to each block to give npiSi^p^g) states. 
Then the total number of maximally entangled pairs re- 
quired to teleport these states to Bob is nJ2iPiS{p%), 
which is clearly less than nS{pB), by concavity of the 
entropy. The amount of entanglement required clearly 
depends on the decomposition of the mixed state pab- 
In order to decompress these states. Bob must also be 
able to identify which members of the ensemble are in 
which state. Therefore Alice must also send him the 
memory system. She now has two options. She may ei- 
ther teleport the memory to Bob, which would use more 
entanglement resources. Or she may communicate the 
information in the memory classically, with no further 
use of entanglement. When Alice uses the minimum en- 
tanglement decomposition, e = (''/'^BliPi}) this 
process, originally introduced by Bennett et ai, (1996b), 
makes the most efficient use of entanglement, consuming 
only the entanglement of formation of the mixed state, 
Ef{pab) = Y^iPiSip^s)- think of the classi- 
cal communication between Alice and Bob in one of two 
equivalent ways. Alice may either measure the mem- 
ory locally to decohere it, and then send the result to 
Bob classically, or she may send the memory through a 
completely decohering quantum channel. Since Alice and 
Bob have no access to the channel, the state of the whole 
system which is created by this process is the mixed state 

Pabm = XlPi|V'As)(V'ABl ® (76) 

where Bob is classically correlated to the AB subsystem. 
Bob is then able to decompress his states using the mem- 
ory to identify members of the ensemble. 

Once the collection of n pairs is shared between Alice 
and Bob, it is converted into an ensemble of n mixed 
states Pab by destroying access to the memory which 
contains the information about the state of any partic- 
ular member of the ensemble. It is the loss of this in- 
formation which is responsible for the fact that entangle- 
ment of distillation is lower than entanglement of for- 
m.ation. since it is not available to parties carrying out 
the distillation. If Alice and Bob, who do have access 
to the memory, were to carry out the distillation, they 
could obtain as much entanglement from the ensemble 
as was required to form it. In the case where Alice 
and Bob share an ensemble of the pure state IV'mab), 



they would simply apply the Schmidt projection method, 
(Bennett et al., 1996a). The relative entropy of entangle- 
ment gives the upper bound to distillable entanglement, 
Ere{\^(ma):b) {iI>(ma):b\) = S{pb), which is the same 
as the amount of entanglement required to create the en- 
semble of pure states, as described above. Here MA and 
B are spatially separated subsystems on which joint op- 
erations may not be performed. In my notation, I use a 
colon to separate the local subsystems. 

On the other hand, if Alice used the least entangle- 
ment for producing an ensemble of the mixed state pab , 
together with classical communication, the state of the 
whole system is an ensemble of the mixed state Pab My 
and the process is still reversible. Because of the classical 
correlation to the states |V-'ab): Alice and Bob may iden- 
tify blocks of members in each pure state IV'ab)' ^^'^ 
apply the Schmidt projection method to them, giving 
npiS{pg) maximally entangled pairs, and hence a total 
entanglement of distillation of ^iPiS{pQ). The relative 
entropy of entanglement again quantifies the amount of 
distillable entanglement from the state Pabm ^^'^ given 
hy EREip^^BM)) = ^^^-yABMeD S{p^^Bj^\\aABM)- The 
disentangled state which minimises the relative entropy is 
(TABM = Y^iPi<^AB ^ \'mi){mi\, where a\g is obtained 
from \iPab) {^ab I by deleting the off-diagonal elements in 
the Schmidt basis. This is the minimum because the state 
Pmab is a mixture of the orthogonal states |mj) |V'ab)> 
and for a pure state |V'ab)> the disentangled state that 
minimises the relative entropy is a\g. The minimum 
relative entropy of the extended system is then 

S{Pabm\\(^abm) = ^PiSipe) 

i 

This relative entropy, EfiE{p'A-(^BM))y previously been 
called the 'entanglement of projection' (Garisto and 
Hardy, 1999), because the measurement on the memory 
projects the pure state of the full system into a particu- 
lar decomposition. The minimum of EjiE{pA (^BM)) ^"^^^ 
all decompositions is equal to the entanglement of for- 
mation of Pab- However, Alice and Bob may choose 
to create the state pab by using a decomposition with 
higher entanglement than the entanglement of forma- 
tion. The maximum of EREip^A-i^BM)) possible 
decompositions is called the 'entanglement of assistance' 
of PAB (DiVicenzo et al., 1998). Because Eu.e{p^a-(bm)) 
is a relative entropy, it is invariant under local opera- 
tions and non-increasing under general operations, prop- 
erties which are conditions for a good measure of en- 
tanglement (Vedral and Plenio, 1998). However, unlike 
Ere{pab) and Ep{pab), it is not zero for completely 
disentangled states. In this sense, the relative entropy of 
entanglement, £'fl£;(p^.(BM))' defines a class of entangle- 
ment measures interpolating between the entanglement 
of formation and entanglement of assistance. Note that 
an upper bound for the entanglement of assistance, Ea, 
can be shown using concavity (DiVicenzo et al., 1998), to 
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be Ea{pab) < niin[S'(/3/i). 5'(/0_b)]- This bound can also 
be shown from the fact that the distillable entanglement 
from any decomposition, Ere{p%,(^bm)) ^ Ea{pab) can- 
not be greater than the entanglement of the original pure 
state. 

Note that here we are really creating a state p®" = 
p® p ■ ■ ■ p. The entanglement of formation of such a state 
is, strictly speaking, given by Ef{p'^'^)] so, the entangle- 
ment of formation per one single pair is Ep{p®^^) / n. It is 
at present not clear if this is the same as Ep{p) in general, 
i.e. whether the entanglement of formation is additive. 
Bearing this in mind wc continue our discussion whose 
conclusions will not depend on the validity of the addi- 
tivity assumption of the entanglement of formation (for 
more on this issues sec for example Haydcn et al, 2000). 

We may also derive relative entropy measures that in- 
terpolate between the relative entropy of entanglement 
and the entanglement of formation (Horodecki et al., 
2000b) by considering non-orthogonal measurements on 
the memory. First of all, the fact that the entanglement 
of formation is in general greater than the upper bound 
for entanglement of distillation, emerges as a property 
of the relative entropy, namely that it cannot increase 
under the local operation of tracing one subsystem (this 
is property F2 of the quantum relative entropy given in 
Section II) (Lindblad 1974), 

Ef{pab)= min S'(pabm| Ictabm) 

> min S{pAB\\<yAB) (77) 

In general, the loss of the information in the memory may 
be regarded as a result of an imperfect classical chan- 
nel. This is equivalent to Alice making a non-orthogonal 
measurement on the memory, and sending the result 
to Bob. In the most general case, {Ei = AiAf} is a 
POVM (positive operator valued measure; loosely speak- 
ing, this is a CP map as in eq. (13) where all the indi- 
vidual outcomes are recorded) performed on the memory. 
The decomposition corresponding to this measurement is 
composed of mixed states, ^ = {qi^TrMiAiPMAB-Af)}, 
where qi = Tr{AipMABA'^). The relative entropy of en- 
tanglement of the state P%[ab ' when ^ is a decomposition 
of pab resulting from a non-orthogonal measurement on 
M, defines a class of entanglement measures interpolat- 
ing between the relative entropy of entanglement and the 
entanglement of formation of the state Pab- In the ex- 
treme case where the measurement gives no information 
about the state pab, ^Re[p%i^bm)) becomes the relative 
entropy of entanglement of the state pab itself. In be- 
tween, the measurement gives partial information. So far, 
I have shown that the measures interpolating between en- 
tanglement of assistance and entanglement of formation 
result from making orthogonal measurements on prepa- 
rations of the pure state IV'mab) in different bases. I note 
that they may equally be achieved by using the prepa- 
ration associated with entanglement of assistance, and 
making increasingly non-orthogonal measurements. 



D. Classical Information and Quantum correlations 

The loss of entanglement may be related to the loss 

of information in the memory. There arc two stages at 
which distillable entanglement is lost. The first is in the 
conversion of the pure state IV'mab) into a mixed state 
Pabm- This happens because Alice uses a classical chan- 
nel to communicate the memory to Bob. The second is 
due to the loss of the memory, M, taking the state pabm 
to pAB- The amount of information lost may be quan- 
tified by the difference in mutual information between 
the respective states. Mutual information is a measure 
of correlations between the memory M and the system 
AB, giving the amount of information about AB which 
may be obtained from a measurement on M . The quan- 
tum mutual information between M and AB is defined as 
Iq{Pm:(ab)) = S{pm) + S{pab)-S{pmab)- The mutual 
information loss in going from the pure state |'0mab) to 
the mixed state in Eq. (76) is AIq = S{pab)- There is 
a corresponding reduction in the relative entropy of en- 
tanglement, from the entanglement of the original pure 
state, Ere{\iI){mA):b) (V'(ma):b|), to the entanglement of 
the mixed state Ere{p%(^bm)) ^^"^ decompositions e 
arising as the result of an orthogonal measurement on the 
memory. It is possible to prove, using the non-increase 
of relative entropy under local operations, that when the 
mutual information loss is added to the relative entropy 
of entanglement of the mixed state Ere {Pa-(bm) ) ' 
suit is greater than the relative entropy of entanglement 
of the original pure state, ERE{\'ip{MAy.B) {'fp{MAy.B\), 
(Henderson and Vedral, 2000). The strongest case, 
which occurs when Ere{p%,(bm)) = Ef{pab), is: 

Ere{\iP(ma):b) (V'(ma):b|) < Ef{pab) + S{pab) (78) 

A similar result may be proved for the second loss, 
due to loss of the memory, (Henderson and Vedral, 
2000). Again the mutual information loss is A/g = 
S{pab)- The relative entropy of entanglement is re- 
duced from Ere{p'^^.(^bm))^ ^'^^ decomposition s re- 
sulting from an orthogonal measurement on the memory, 
to Ere{pab), the relative entropy of entanglement of 
the state pab with no memory. When the mutual infor- 
mation loss is added to Ere{pab), the result is greater 
than Ere{pa.(^bm))- case, the result is strongest 

for Ere{p%(bm)) = Ea{pab)- 

Ea{pab) < Ere{pab) + S{pab) (79) 

Notice that if pab is a pure state, then S{pab) = 0, 
and equality holds. Inequalities (78) and (79) provide 
lower bounds for Ef{pab) and Ere{pab) respectively. 
They are of a form typical of irreversible processes in that 
restoring the information in M is not sufficient to restore 
the original correlations between M and AB. In partic- 
ular, they express that the loss of entanglement between 
Alice and Bob at each stage must be accompanied by an 
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even greater reduction in mutual information between the 
memory and subsystems AB. The general result can be 
derived from Donald's equality (Donald, 1986). We have 
in general that for any a and p = J^iPiPi following 
is true 

S{p\W) + J2piS{pi\\p) = ^PiS{pi\\a) 

i i 

Suppose that E{p) = ^(pllcr). Then, since E{pi) < 
S{pi\\a), we have the following inequality 

E{p) + J2piS{pi\\p)>J2piE{pi) 

i i 

Thus, the loss of entanglement in {pi, pi} — > p is bounded 
from above by the Holevo information 

J2P^E{P^) - E{p) < Y,PiS{Pi\\p) (80) 

i i 

This is a physically pleasing property of entanglement. It 
says that the amount of information lost always exceeds 
the lost entanglement, which indicates that entanglement 
stores only a part of information - the rest, of course, 
is stored in classical correlations, (see also Eiscrt et al, 
2000, who consider a similar problem, although not in 
the full generality of the above analysis). 

In summary, the relative entropy of entanglement of 
the state pab depends only on the density matrix Pab, 
and gives an upper bound to the entanglement of distil- 
lation. The other measures of entanglement, which are 
given by relative entropies of an extended system, all de- 
pend on how the information in the memory is used, or 
how the density matrix is decomposed. There are nu- 
merous decompositions of any bipartite mixed state into 
a set of states pi with probability pi. The average entan- 
glement of states in each decomposition is given by the 
relative entropy of entanglement of the system extended 
by a memory whose orthogonal states are classically cor- 
related to the states of the decomposition. This correla- 
tion records which state pi any member of an ensemble of 
mixed states p®^ is in. It is available to parties involved 
in formation of the mixed state, but is not accessible to 
parties carrying out distillation. When the classical in- 
formation is fully available, different decompositions give 
rise to different amounts of distillable entanglement, the 
highest being entanglement of assistance and the low- 
est, entanglement of formation. If access to the classical 
record is reduced, the amount of distillable entanglement 
is reduced. In the limit where no information is available, 
the upper bound to the distillable entanglement is given 
by the relative entropy of entanglement of the state pab 
itself, without the extension of the classical memory. 

I close this section by discussing generalisations to 
more than two subsystems. First of all it is not at all 
clear how to perform this in the case of entanglement of 
formation and distillation. The former one just does not 
have a natural generalisation and, for the later one, it is 



not clear what states should we be distilling when we have 
three or more parties. The relative entropy of entangle- 
ment on the other hand docs not suffer from this problem 
(Vedral and Plenio, 1998; Vedral et al., 1997b). Its defi- 
nition for N parties would be £'i^_E(c^) := miup^D S{a\\p) 
where p = Y^iPiPi <^ Ph--- <^ Pn- 

I will now use the knowledge we have gained of classical 
and quantum correlations to describe quantum computa- 
tion. It will be seen, perhaps somewhat surprisingly, that 
classical correlations will play a more prominent role than 
quantum correlations in the speed-up of certain quantum 
algorithms. 

V. QUANTUM COMPUTATION 

A quantum computer is a physical system that can 
accept input states which represent a coherent superpo- 
sition of many different possible basis states and subse- 
quently evolve them into a corresponding superposition 
of outputs. Computation, i.e. a sequence of unitary 
transformations, affects simultaneously each element of 
the superposition, generating a massive parallel data pro- 
cessing albeit within one piece of quantum hardware. In 
this way quantum computers can efficiently solve some 
problems that arc believed to be intractable on classical 
computers (Deutsch and Josza, 1992) (the best example 
is Shor's factorisation algorithm (Shor, 1996)). There- 
fore the advantage of a quantum computer lies in the 
exploitation of the phenomenon of superposition. The 
great importance of the quantum theory of computation 
is in the fact that it reveals the fundamental connections 
between the laws of physics and the nature of computa- 
tion (Deutsch, 1998). 

In order to understand the efficiency of computer algo- 
rithms, we have to discuss the theory of computational 
complexity. I will only mention the basics, but a more 
detailed account can be found in e.g. (Papadimitriou, 
1995). Computational complexity concerns the difficulty 
of solving certain problems, such as for example multipli- 
cation of two numbers, finding the minimum of a given 
function and so on. Complexity theory divides problems 
into two basic categories: 

1. easy problems: the time of computation T is a 
polynomial function of the size of the input I, i.e. 
T = c„/" + . . . + ci Z + Co , where the coefficients c are 
determined by the problem. 

2. hard problems: the time of computation is an ex- 
ponential function of the size of the input (e.g. 
T = 2"^ where c is problem dependent). 

The size of the input is always measured in bits (qubits) . 
For example, if we are to store the number 15, then we 
need 4 bits. In general, to store a number N we need 
about I = logN, where the base of the logarithm is 2. 

The division of problems into 'easy' and 'hard' is, of 
course, very rough. First of all, in computation, apart 
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from time, there are other resources which might matter, 
such as space, energy and so on. If time grows polynomi- 
ally, but we require an exponentially increasing energy, 
then the problem is clearly difficult. Also, suppose that 
the time complexity of one problem is 10^°n and of an- 
other one is 10~^°2". Then for small n (say n — 10), the 
second algorithm, in spite of being exponential, is clearly 
more efficient. These two issues exemplify that the divi- 
sion into hard and easy problems is not without its own 
problems. However, this classification system is very sim- 
ple to put into practice and does illuminate many differ- 
ent aspects of computational problems which is why it is 
so widely used. I refer the reader to the book by Garey 
and Johnson (1979) which presents an introduction to 
hard problems and their detailed classification. 



.^,^(0)-^(l) 
2 

If, for example, 0(0) = ^(1)7 then only the detector 
will be registering counts. If, on the other hand 
0(0) = 0(1) ± TT, then only detector 1 will be registering 
counts. These two situations are basically identical to 
what is known as Deutsch's algorithm (Deutsch, 1985), 
the first algorithm to give an indication that quantum 
computers are more powerful than their classical coun- 
terparts. This algorithm has also implemented experi- 
mentally in Nuclear Magnetic Resonance (NMR) (Jones 
and Mosca, 1998a). 

A. Deutsch's algorithm 
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FIG. 8. Tlie Mach-Zcndcr interferometer. A photon is split 
at a beam-splitter and can take two different paths. In each of 
the paths we have a different phase introduced to the photon 
state, so that, after it encounters the second beam-splitter, the 
probabilities of detection in two branches have the sinusoidal 
dependance on the phase difference. In terms of quantum 
computation, the beam-splitter implements the Hadamard 
transform and the whole interferometer can be seen as im- 
plementing Deutsch's algorithm (see text for explanation). 



There is a great simplification in understanding quan- 
tum computation: a quantum computer is formally 
equivalent to a multiparticle " Mach-Zender like" inter- 
ferometer (Clevc ct al., 1997). I first present the simplest 
kind of interferometer in terms of its function as a simple 
computer. We see from the Fig. 8 that the path of the 
photon is in fact a quantum bit in the sense that the pho- 
ton can be in a superposition of the two paths. The first 
beam splitter acts as the unitary evolution |0) |0) -|- 11) 
which is known as the Hadamard gate. Next is the phase 
shift which has the following effect 



|0)- 
|1>- 



|0) 
|1> 



At the end we have another beam splitter and two detec- 
tor measuring contributions to the state |0) and |1). The 
corresponding probabilities of detection are 



Deutsch's problem (Deutsch, 1985) is the simplest pos- 
sible example which illustrates the advantages of quan- 
tum computation. The problem is the following. Suppose 
that we are given a binary function of a binary variable 
/ : {0, 1} — > {0, 1}. Thus, /(O) can either be or 1, and 
/(I) likewise can either be or 1, giving altogether four 
possibilities. However, suppose that we are not interested 
in the particular values of the function at and 1, but 
we need to know whether the function is: 1) constant, 
i.e. /(O) = /(I), or 2) varying, i.e. /(O) ^ /(I). Now 
Deutsch poses the following task: by computing / only 
once determine whether it is constant or varying. This 
kind of problem is generally referred to as a promise al- 
gorithm,, because one property out of a certain number 
of properties is initially promised to hold, and our task 
is to determine computationally which one holds (see 
also (Deutsch and Josza, 1992) for other similar types 
of promise algorithms). 

First of all, classically finding out in one step whether 
a function is constant or varying is clearly impossible. 
We need to compute /(O) and then compute /(I) in or- 
der to compare them. There is no way out of this double 
evaluation. Quantum mechanically, however, there is a 
simple method to achieve this task by computing / only 
once! Two qubits are needed for the computation. In 
reality only one qubit is really needed, but the second 
qubit is there to implement the necessary transforma- 
tion. We can imagine that the first qubit is the input to 
the quantum computer whose internal (hardware) part 
is represented by the second qubit. The computer itself 
will implement the following transformation on the two 
qubits (we perform this fully quantum mechanically, i.e. 
we arc now not using "classical" devices such as beam 
splitters): 



\x)\y) \x)\y®f{x)) 



(81) 



Pq = cos 



2 <^(o) - m 



where x is the input qubit and y the hardware, as de- 
picted in Fig. 9. Note that this transformation is re- 
versible and thus there is a unitary transformation to 
implement it (but we will not pay any attention to that 
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at the moment, as we are only interested here in the ba- 
sic principle). Note also that / has been used only once. 
The trick is to prepare the input in such a state that we 
make use of quantum superpositions. Let us have at the 
input 

|x)|y) = (|0) + |l))(|0)-|l)), (82) 

where \x) is the actual input and \y) is part of the com- 
puter hardware. Thus before the transformation is im- 
plemented, the state of the computer is in an equal su- 
perposition of all four basis states, which we obtain by 
simply expanding the state in eq. (82), 

|«'in) = |00)-|01) + |10)-|ll) . 

Note that there are negative phase factors before the sec- 
ond and fourth term. When this state now undergoes the 
transformation in eq. (81), we have the following output 

state 



i*out) = io./(o)) - \Qfm_+ ii/(i)) - ii/(i)^ 

= |0)(|/(0))- 1/(0))) + |1)(|/(1))- 1/(1))), 

where the bar indicates the opposite of that value, so 
that, for example, 0=1. Now we see where the power of 
quantum computers is fully realised: each of the compo- 
nents in the superposition of |^'j^) underwent the same 
evolution of eq. (81) "simultaneously", leading to the 
powerful "quantum parallelism" (Deutsch, 1985). This 
feature is true for quantum computation in general. Let 
us look at the two possibilities now: 

1. if / is constant then 

l*out) = (|0) + |l))(|/(0))-|7(0)))- 

2. if / is varying then 

l*out> = (|o) - |i))(l/(o)) - 17(0)» • 

Note that the output qubit (the first qubit) emerges in 
two different orthogonal states, depending on the type 
of /. These two states can be distinguished with 100 
percent efficiency. This is easy to see if we first per- 
form a Hadamard transformation on this qubit, leading 
to the state |0) if the fmiction is constant, and to the 
state |1) if the function is varying. Now a single projec- 
tive measurement in 0, 1 basis determines the type of the 
function. Therefore unlike their classical counterparts 
quantum computers can solve Deutsch problem. 

Let us now rephrase this in terms of phase shifts to 
emphasise its underlying identity with the above Mach- 
Zcndcr interferometer. The transformation of the two 
registers is the following 

k)|-)^e^"^W|x)|-) 



where .t = 0, 1 and |— ) = |0) — |1). Thus, the first qubit 
is like a photon in the interferometer, receiving a condi- 
tional phase shift depending on its state (0 or 1). It is 
left to the reader to show that this transformation is for- 
mally identical to the above analysis. The second qubit 
is there just to implement the phase shift quantum me- 
chanically. It should be emphasised that this quantum 
computation, although extremely simple, contains all the 
main features of successful quantum algorithms: it can 
be shown that all quantum computations are just more 
complicated variations of Deutsch's problem (Cleve et al, 
1997). We will use the introduction of a phase shift as a 
basic element of a quantum computer and relate this to 
the notion of distinguishability and relative entropy. 

Note one important aspect: the input could also be of 
the form |— )|— ). A constant function would then lead 
to the state |— )|— ) and a varying function would lead 
to |-l-)|— ). So, the |-|-) and |— ) are equally good as in- 
put states of the first qubit and both lead to quantum 
speed-up. Their equal mixture, on the other hand, is 
not. This means that the output would be an equal mix- 
ture \+){+\ + |-)(-| no matter whether /(O) = /(I) or 
/(O) 7^ ./(I), i.e. the two possibilities would be indistin- 
guishable. Thus for quantum algorithm to work well, we 
need the first register to be highly correlated to the two 
different types of functions. So, if the output state of the 
first qubit pi indicates that we have a constant function 
and p2 that we have a varying function, then the effi- 
ciency of Deutsch's algorithm depends on how well we 
can distinguish the two states pi and p2- This is given 
by the Holevo bound 

H = S{p)-\{S{p,) + S{p2)) 

where p = \/2{pi + p2). Thus if pi — p2, then H = and 
the quantum algorithm has no speed up over the classi- 
cal one. One the other extreme, if pi and p2 are pure 
and orthogonal, then H = 1 and the computation gives 
the right result in one step. In between these two ex- 
tremes lie all other computations with varying degree of 
efficiency as quantified by the Holevo bound. Note that 
these are purely classical correlations and that there is no 
entanglement between the first and the second qubit. In 
fact the Holevo bound is the same as the formula I sug- 
gested for classical correlations in the previous section. 
The key to understanding the efficiency of Deutsch's al- 
gorithm is therefore through the mixedness of the first 
register. If the initial state has the entropy of So, then 
the final Holevo bound is 

S{p) - So 

So the more mixed the first qubit the less efficient the 
computation. Note that the quantum mutual informa- 
tion between the first two qubits is zero throughout the 
entire computation (so there are neither classical nor 
quantum correlations between them). 
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B. Computation: Communication in time 

Can we extend the above entropic analysis to other al- 
gorithms as well? The answer is yes and this is exactly 
what I will describe next (Bose et al, 2000b). To explain 
why this is so, I first need to introduce a few definitions 
and a communication model of quantum computation. 
We have two programmers, the sender and the receiver 
and two registers, the memory (M) register and the com- 
putational (C) register. The sender prepares the memory 
register in a certain quantum state \i)M which encodes 
the problem to be solved. For example, in the case of fac- 
torization (Shor, 1996), this register will store the num- 
ber to be factored. In case of a search (Grover, 1996), 
this register will store the state of the list to be searched. 
The number TV of possible states \i)M will, of course, be 
limited by the greatest number that the given computer 
could factor or the largest list that it could search. The 
receiver then prepares the computational register in some 
initial state p^. Both the sender and the receiver feed the 
registers (prepared by them) to the quantum computer. 
The quantum computer implements the following general 
transformation on the registers 

mi\)M ®P%^ (K)(i|)M ® Uip%Ul (83) 

The resulting state pc{i) = UiP^qU} of the computational 
register contains the answer to the computation and is 
measured by the receiver. As the quantum computation 
should work for any \i)M, it should also work for any 
mixture Pi{\i){i\)M-, where are probabilities. For 
the sender to use the above computation as a communi- 
cation protocol, he has to prepare any one of the states 
\i)M with an apriori probability p,. The entire input en- 
semble is thus Pi{\i) («|)m Pc- Due to the quantum 
computation, this becomes 

N N 

Y.Pi{\i){i\)M Sp%^ I]P^(I^)(^I)M S pcii). (84) 

i i 

Whereas before the quantum computation, the two regis- 
ters where completely uncorrelated (mutual information 
is zero), at the end, the mutual information becomes 

Imc ■ = S{pm) + S{pc) - S{pMc) 

N 

= S{pc)-Y,PiS{pcii)), (85) 

i 

where pM and pc arc the reduced density operators for 
the two registers, pMC is the density operator of entire 
M+C system and S{p) = — Trplogp is the von Neumann 
entropy (for conventional reasons we will use logj in all 
calculations). Notice that the value of the mutual in- 
formation (i.e correlations) is equal to the Holevo bound 

H = S{pc) — Yh Pi^ipcii)) for the classical capacity of 
a quantum communication channel (Holevo, 1973) (Note 



that Pc = Yi PiPcii))- This tells us how much informa- 
tion the receiver can obtain about the choice \i) m made 
by the sender by measuring the computational register. 
The maximum value of H is obtained when the states 
pc{i) are pure and orthogonal. Moreover, the sender 
conveys the maximum information when all the message 
states have equal apriori probability (which also max- 
imizes the channel capacity). In that case the mutual 
information (channel capacity) at the end of the compu- 
tation is logA^. Thus the communication capacity Imc 
(given by Eq.(85)) gives an index of the efficiency of a 
quantum computation. A necessary t,arget of a quar),t,um 
computation is to achieve the maximum possible commu- 
nication capacity consistent with given initial states of 
the quantum computer. We cannot give a sufficiency cri- 
terion from our general approach as this depends on the 
specifies of an algorithm. If one breaks down the general 
unitary transformation Ui of a quantum algorithm into a 
number of successive unitary blocks, then the maximum 
capacity may be achieved only after number of applica- 
tions of the block. In each of the smaller unitary blocks, 
the mutual information between the M and the C reg- 
isters (i.e the communication capacity) increases by a 
certain amount. When its total value reaches the maxi- 
mum possible value consistent with a given initial state 
of the quantum computer, the computation is regarded 
as being complete. 

C. Black box complexity 

Any general quantum algorithm has to have a certain 
number of queries into the memory register (Bennett et 
al., 1997; Beals et al, 1998; Ambainis, 2000) (this is neces- 
sitated by the fact that the transformation on the compu- 
tational register has to depend on the problem at hand, 
encoded in \i) m)- These queries can be considered to be 
implemented by a black box into which the states of both 
the memory and the computational registers are fed. The 
mimbca' of such queries needed in a certain quantum al- 
gorithm gives the black box complexity of that algorithm 
(Bennett et al., 1997; Beals et al., 1998; Ambainis, 2000) 
and is a lower bound on the complexity of the whole 
algorithm. The black box approach is a simplification 
for looking at the complexity of an algorithm. A black 
box allows us to perform a certain computation with- 
out having its exact details. It is possible that physical 
implementations of a particular black box may prove to 
be difficult. So when we estimate the complexity of an 
algorithm by counting the number of applications of a 
black box, we have to bear in mind that there might an 
additional complexity component arising in physical im- 
plementation. 
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FIG. 9. This figure represents a network that implements 
the phase flip operation given the black box computing the 
function f{x). The unitary transformation U implements 
|0) — > — 10) conditionally on the value of f{x). 

In general we have a function / : {0, 1}" {0, 1} (so 
the function maps n-bit values to either or 1). Quantum 
algorithms, such as database search, can be expressed in 
this form (in the case of database search, all the values 
of / arc apart from one which is equal to 1; the task is 
to find this value). The black box is assumed to be able 
to perform the transformation \x)\y) — > \x)\f{x)(By), 
just like in Deutsch's algorithm. We have the freedom to 
represent this black box transformation as a phase flip 
which is equivalent in power (up to a constant factor as 
seen in Fig. 9), 

\x)\y) ^ {-iy^^^^y\x)\y) 

Recently, Ambainis (2000) showed in a very elegant pa- 
per that if the memory register was prepared initially in 
the superposition J2i^ N)m, then, in a search algorithm, 
0{y/N) queries would be needed to completely entan- 
gle it with the computational register. This gives a lower 
bound on the number of queries in a search algorithm. In 
a manner analogous to his, we will calculate the change in 
mutual information between the memory and the compu- 
tational registers (from Eq.(85)) in one query step. The 
number of queries needed to increase the mutual infor- 
mation to log TV (for perfect communication between the 
sender and the receiver), is then a lower bound on the 
complexity of the algorithm. 

D. Database search 

Any search algorithm (whether quantum or classical, 
irrespective of its explicit form), will have to find a match 
for the state |«)m of the M register among the states \ j)c 
of the C register and associate a marker to the state that 
matches (Here, \j)c is a complete orthonormal basis for 
the C register). The most general way of doing such 
a query in the quantum case is the black box unitary 
transformation (Ambainis, 2000) 

UB\^)M\j)c^i-lY'■'\^)M\J)c■ (86) 

Any other unitary transformation performing a query 
matching the states of the M and the C registers, could 
be constructed from the above type of query. Note that 
the black box is able to recognize if a value in the C regis- 
ter is the same as the solution, but is unable to explicitly 



provide that solution for us. For example, imagine that 
Socrates goes to visit the all-knowing Ancient Greek or- 
acle (black box) who is only able to answer with "yes" 
or "no". Suppose further that Socrates wants to know 
who the wisest person in the world is. He would then 
have to ask something like "Is Plato the wisest person 
in the world?" and would not be able to ask directly 
"Who is the wisest person in the world?". This "yes-no" 
approach is typical for any black box analysis. The ad- 
vantage of using this black box quantum mechanically is 
that we can query all the individual elements of the su- 
perposition simultaneously. Although we can identify the 
sohition in one step quantum mechanically, further com- 
putations are required to amplify the right solution so 
that the subsequent measurement is more likely to reveal 
it. 

I would like to put a bound on the change of the mutual 
information in one such black box step. Let the memory 
states \i)M be available to the sender with equal apriori 
probability so that the communication capacity is a max- 
imum. His initial ensemble is then ;^ X^f'^(|«)(i|)M- Let 
the receiver prepare the C register in an initial pure state 
'0*' (in fact, the power of quantum computation stems 
from the ability of the receiver to prepare pure state su- 
perpositions of form \j)c)- This is an equal weight 
superposition of all \ j)c as there is no apriori information 
about the right \j)c- This can be done by performing a 
Hadamard transformation to each qubit of the C regis- 
ter. In general, there will be many black box steps on 
the initial ensemble before a perfect correlation is set up 
between the M and the C registers. Let, after the fcth 
black box step, the state of the system be 

1 ^ 

P'=N ® il^^iMi^'mc (87) 

i 

where 

|V'=(»))c=E4lj)c- (88) 
j 

The (A;-|- l)th black box step changes this state to p'^^^ = 

E^m\)M ® iii^'+'m^'+'mc with 

N 

\i>^'+'\i})=J^a>ij{-iy'^\j)c. (89) 

Thus we only have to evaluate the difference of mutual 
information between the M and the C register for the 
states. This difference of mutual information (when com- 
puted from Eq.(85)) can be shown to be the difference 
|S'(p^+^) - S{p!^)\ (Henderson and Vedral, 2000). This 
quantity is bounded from the above by (Fannes, 1999) 

|5(p^+i) - S{p'c)\ < dB{ph,p''c^')iogN 

-rfB(p^,P^+')logds(p^,P^+') (90) 
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where, dsicr, p) = -^/l — F'^{a, p) is the Bures metric (Bu- 
res, 1969) and F(cr, p) — Ty^ ^fpa^ is the fidehty. Us- 
ing methods similar to Ambainis (2000), it can be shown 
that F{p^,p]j) > ^^^^ from which it follows that the 
change in the first step 



Wc)-S{p\,)\<-^\ogN. 



(91) 



The change \S{p^) — S{p^(^^)\ in the subsequent steps 
has to be less than or equal to the change in the first 
step. This is because the Bures metric does not increase 
under general completely positive maps (which is what 
the query represents when we trace out the M register). 
Any other operations performed only on the C register 
in between two queries can only reduce the mutual infor- 
mation between the C and the M register. This means 
that at least 0{^/N) steps arc needed to produce full cor- 
relations (maximum mutual information of value log A'') 
between the two registers. This gives the black box lower 
bound on the complexity of any quantum search algo- 
rithm. Of course, we know that there also exists an al- 
gorithm achieving this bound due to Grover (1996) and 
this has been proven to be optimal (Bennett et al., 1996a; 
Ambainis, 2000; Zalka, 1999). However, the proof pre- 
sented here is the most general as it holds even when 
any type of completely positive map is allowed between 
the queries (only in Zalka (1999) a heuristic argument 
was made for the optimality of Grover 's algorithm under 
general operations). Grover 's algorithm has also been 
implemented experimentally (Jones and Mosca, 1998b). 
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FIG. 10. The figure shows the circuit for Grover's algo- 
rithm. C is the computational register and M is the memory 
register. Ub is the black box query transformation, is a 
Hadamard transformation on every qubit of the C register 
and /o is a phase flip in front of the |00...0)c. The block 
consisting of H, Ub,H and fo is repeated a number of times. 



I now use Grover's algorithm to show how the mutual 
information varies with time in a quantum search. The 
general sequence described by Cleve et. al (1997) for 
Grover's algorithm will be used in this letter. The al- 
gorithm consists of repeated blocks, each consisting of a 
Hadamard transform on each qubit of the C register, fol- 
lowed by a Ub (our black box transformation), followed 
by another Hadamard transform on each qubit of the C 
register and finally a phase flip fo of the |00...0)c state 



of the C register (See Fig. 10). This block can then be 
repeated as many times as is necessary to bring the mu- 
tual information to its maximum value of log N, which, 
as I have shown in Eq.(91) to be 0(-\/iV). Note that the 
only transformation correlating the M and C registers is 
the black box transformation Ub and all the other trans- 
formations are done only on the C register and therefore 
do not change the mutual information between the two 
registers. In Fig. 8 I have plotted the variation of mutual 
information between the M and the C registers (i.e the 
communication capacity of the quantum computation) 
with the number of iterations of the block in Grover's 
algorithm. It is seen that the mutual information oscil- 
lates with the number of iterations. Fig. 11 is plotted 
for a four qubit computational register which can search 
a database of 16 entries. It is seen that the period is 
roughly 6, which means that the number of steps needed 
to achieve maximum mutual information is roughly 3. 
This is well above our bound for the minimum number 
of steps, which is 4/3 in this case. 




10 15 20 

No of Iterations 

FIG. 11. The figure shows the dependence of the mutual 
information between the M and the C registers as a func- 
tion of the number of times the block in Grover's algorithm 
is iterated for various values of initial mixedness of the C reg- 
ister. Each qubit of the C register is initially in the state 
p|0)(0|-F(l-j3)|l>(ll, (a)p= 1, (b)p = 0.95 and (c) p = 0.7. 
The (a) and (b) computations achieve higher mutual infor- 
mation than classically allowed in the order of root N steps, 
while (c) does not. 



The fact that the mutual information oscillates peri- 
odically (or more precisely, " quasi" -periodically) follows 
from the quantum Poincare recurrence theorem (Hogg 
and Huberman, 1983). Namely, if the system has a dis- 
crete spectrum and is " driven" by a periodic potential (as 
is in Grover's case, where we repeat the same operation 
time and again), then its wavefunction ')p{t) will undergo 
a quasi periodic motion, i.e. for any e > 0, there exists a 
relatively dense set {T^} such that 

mt + T,)-i,{t)\\<e 

for all time t and for each in the set. This is exactly the 
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behavior seem in the Fig. 9. The distance between the 
two states ji/') = J2i^i\^) ^^"^ = J2j^j\j) defined 
in the usual way 

IIV'-?^II — ^Idi-bil 

i 

The three graphs (a), (b) and (c) in Fig. 9 arc for differ- 
ent values of initial mixcdncss of the C register. We find 
that the mutual information fails to rise to the maximum 
value of log A'' when the state of the computational reg- 
ister is mixed. Our formalism thus allows us to calculate 
the performance of a quantum computation as a func- 
tion of the mixedness (quantified by the von Neumann 
entropy) of the computational register. We can put a 
bound on the entropy of the second register after which 
the quantum search becomes as inefficient as the classi- 
cal search. If the initial entropy S{p^) of the C register 
exceeds ^logA'', then the change in mutual information 
between the M and the C registers in the course of the 
entire quantum computation would be at most \og^/N- 
This can be achieved by a classical database search in 
VN steps. So there is no advantage in using quantum 
evolution when the initial state is too mixed. Note that 
our condition 

Sufficient condition for no quantum speed-up. 

5(pO)>llog7V 

Note that this is only a suflacient and not a necessary 
condition. 

I also point out that the states of the M register need 

not be a mixture, but could be an arbitrary superposition 
of states \i)M (such a state was used by Ambainis in his 
argument (Ambainis, 2000)). All the above arguments 
still hold in that case, and the M and the C registers 
become quantum mechanically entangled and not just 
classically correlated. Thus our analysis implies that any 
quantum computation is mathematically identical to a 
measurement process (Everett, 1973). The system being 
measured is the M register and the apparatus is the C 
register of the quantum computer. As the time progresses 
the apparatus (register C) becomes more and more cor- 
related (or entangled) to the system (register M). This 
means that the states of register C become more and 
more distinguishable which allows us to extract more in- 
formation about the M register by measuring the C reg- 
ister. The analysis in the last paragraph, where I showed 
the limitations on the efficiency of quantum computa- 
tion imposed by the mixedness of the C register, applies 
also to the efficiency of a quantum measurement when 
the apparatus is in a mixed state. Mixedness of an ap- 
paratus, to the best of our knowledge, has never been 
considered in the analysis of quantum measurement. In 
general practice, any apparatus, however macroscopic, is 
considered to be in a pure quantum state before the mea- 
surement. Our approach highlighting the formal analogy 



between measurement and computation offers a way to 
analyse measurement in a much more general context. 

Finally, I would like to discuss what would happen if we 
decided to change the nature of the Black box. Suppose 
that instead of being able to recognize the right solution, 
the black box is much more powerful and it can compare 
if the individual bit values coincide with the bit values of 
the solution. So, for all k, 

\iaii...ik...in)\joij---jk---jn) 

(-lyikJk \i„ii...ik—in)\joij-:jk-:jn), (92) 

where i = ioii...ik--.in and j = joij---jk---jn are the bi- 
nary representations of i and j respectively. Then, it can 
easily be checked that this gate has the power to corre- 
late the C and the M register by the amount of log 2. 
Therefore the search algorithm would take logA^ steps 
(instead of VN), i-e. it would be polynomial instead of 
exponential! There is, of course, a hidden complexity 
here which is in constructing the new Black box from the 
original Black Box. It can be shown that this requires 
exponential increase in time (or space which can always 
be traded for time) and this then compensates the ex- 
ponential decrease in the number of applications of the 
new Black Box. In fact, this new black box is equiva- 
lent to the Ancient Greek oracle being able to answer to 
Socrates " Who is the wisest person in the world?" . 

Can we use entropic measures of the above form to 
quantiiy complexity of other quantum algorithms? The 
answer is unclear at present. The only algorithm that 
presently achieves an exponential speed-up over its clas- 
sical counterpart, Shor's factorisation algorithm (Shor, 
1996), cannot be usefully re-phrased in terms of black 
box operations (more precisely, it is rather trivial, as it 
requires only one black box operation!). However, this 
does not prevent us from deriving fundamental bounds 
on information storage and the speed of its processing 
based on the Uncertainty principle. In the next subsec- 
tion, I show the ultimate limits of processing power no 
matter what model of computation is used so long as it 
uses quantum systems (particles or fields alike). 

Quantum computation and quantum measure- 
ment. I now show that quantum computation is for- 
mally identical to a quantum measurement as described 
by von Neumann (1955). The analysis will be performed 

in the most general continuous case. Suppose that we 
have a system S (described by a continuous variable x) 
and an apparatus A (described by a continuous variable 
y interacting via a Hamiltonian H = xp, where p is the 
momentum of A (we will assume that ti — 1). Suppose 
in addition that the initial state of the total system is 

l*(0)>= / 4){x)\x)dx®r){y)\y)dy 

in an uncorrelated state. The action of the above Hamil- 
tonian then transforms the state into an entangled state. 
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In order to calculate this transformation it will be bene- 
ficial to introduce the (continuous) Fourier transform 



E. Ultimate limits of computation: The Bekenstein 
bound 



F, : \y) 



-lyp 



\p)dp 



which takes us from the position space of A into the mo- 
mentum space of A. This is important because we know 
the effect of the Hamiltonian in the momentum basis. 
Now, the action of the unitary transformation generated 
by H is 

\^{t)) = e-'^f*|'J'(0)) 

= F^e-^^f%|*(0)) 

<f){x)r]{y - xy)\x)\y)dxdy 



and we see that S and A are now correlated in x and y. 
This means that by measuring A we can obtain some in- 
formation about the state of S. The mutual information 
I AS = H(x) -\- H{y) — H{x,y) can be shown to satisfy 
(Everett, 1973) 

I AS > Int 

i.e. it is growing at a rate faster than logarithm of time 
passage during the measurement. This gives us a lower 
bound to exactly how quickly correlations can be estab- 
lished between the system and the apparatus. This is 
analogous to the way I derived the upper bound on the 
efficiency of quantum search algorithms in Section V. 

I now show the detailed calculation of the effect of 
measurement Hamiltonian. Let us define 

■■= FyMv)} 

The evolution then proceeds as follows, 

|*(t)) = e-^f* / cj){x)\x)dx ® v{y)\y)dy 

J X 

= e-^f* / (j){x) j { j r]iy)e-'yPdy}\x)\p)dxdp 
' i / (t>{x)^{p)\x)\p)dxdp 

J X J p 

I { I me-'^'"^'''''dp}\x)\y)dxdy 

ly Jp 

= I I 4>{x)r]{y - xy)\x)\y)dxdy 

This result has the same formal structure of quantum 
algorithms presented before: a Fourier transform, fol- 
lowed by a conditional phase shift and then followed by 
another Fourier transform (c.f. Deutsch's and Grover's 
algorithms). Therefore we can see that how efficiently we 
can measure something is the same as how efficiently we 
can compute, both of which depend on how quickly we can 
establish correlations. 



Given a computer enclosed in a sphere of radius R and 
having available the total amount of energy E what is the 
amount of information that it can store and how quickly 
can this information be processed? The Holevo bound 
gives us the ultimate answer. The amount of information 
that can be written into this volume is bounded from the 
above by the entropy, i.e. the number of distinguishable 
states that this volume can support. I will now use a 
simple, informal argument to obtain this ultimate bound 
(Tipler, 1994), but the rigorous derivation can be found 
in (Bekenstein, 1981). The bound on energy implies a 
bound on momentum and the total number of states in 
the phase space is 



N = 



PR 
APAR 



< 



PR 



where the inequality follows from the Heisenberg uncer- 
tainty relations APAR > h which limits the size of the 
smallest volume in the phase space to h in each of the 
three spatial directions. From relativity we have that for 
any particle p < E/c, so that 

^ , ER ER 

I <\nN < N < — — < -— 
en he 

which is known as the Bekenstein bound. In reality this 
inequality will most likely be a huge over-estimate, but 
it is important to know that no matter how we encode 
information we cannot perform better than is given by 
our most accurate present theory - quantum mechan- 
ics. As an example consider a nucleus of the Hydro- 
gen - according to the above result it can encode about 
100 bits of information (I assumed that E = mc? and 
that R = lG~^'''m). At present, NMR quantum compu- 
tation achieves "only" one bit per nucleus (and not per 
nucleon!)- spin "up" and spin "down" are the two states. 

From the Bekenstein bound we can derive a bound 
on the efhciency of information processing. Again my 
derivation will be loose, and a much more careful calcu- 
lation confirms what I will present (Bekenstein, 1984). 
All the bits in the volume V cannot be processed faster 
than it takes light to travel across V = A/SnR^, which is 
2R/c. This gives 

dJ^E_ 

'dt - 2n 

Again a Hydrogen nucleus can process 10^** bits per sec- 
ond, which is also in sharp contrast with NMR quantum 
computation where a NOT gate takes roughly a few mil- 
liseconds leading to a maximum of 10"^ bits per second. 

The Bekenstein bound shows that there is a poten- 
tially great number of under-used degrees of freedom in 
any physical system. This provides hope that quan- 
tum computation will be an experimentally realisable 
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goal. At present, there is a number of different practical 
implementations of quantum computation, but none of 
them can store and manipulate more than 10 qubits at 
a time (5 was the largest number (Vandersypen, 2000) 
manipulated in a genuine quantum computation process 
when this review was finished in the summer of 2000). 
The above calculation, however, does not take into ac- 
count the environmental influence on computation nor 
the experimental precision. 1 have not at all touched 
on the practical possibility of building a quantum com- 
puter. This is partly for reasons of space, partly because 
it would spoil the flow of exposition and partly because 
there is already a number of excellent reviews of this sub- 
ject (Steane, 1997). It is generally acknowledged that the 
difficulties in building a quantum computer are only of 
practical nature and there are no fundamental limits that 
prohibit such a device. I hope that this section offers con- 
vincing arguments that building a quantum computer is 
a very much worthwhile adventure, both from the tech- 
nological as well as fundamental perspective. In any case 
we see that there is a great deal of currently unused po- 
tential in physical systems in which to store and encode 
information. As our level of technology improves we will 
find more and more ways of getting closed to the Beken- 
stein bound. 



VI. CONCLUSIONS 

We have seen how distinguishability of different physi- 
cal states is at the heart of information processing, which 
we quantified using the relative entropy. The relative en- 
tropy told us about the possibility of confusing two prob- 
ability distributions, or, in the quantum case, two density 
matrices. We have seen that relative entropy never in- 
creases under any general quantum evolution, meaning 
that states can become only less distinguishable as time 
progresses. The most important consequence of this was 
shown to be the Holevo bound, which is the bound on 
the capacity for classical communication using quantum 
states. This basically told us that n qubits cannot store 
more than n classical bits of information. While this ap- 
pears to be a severe limitation on quantum information 
processing, I showed with the aid of dense coding that 
quantum communication is in some sense more efficient 
than its classical counterpart. Dense coding involved the 
use of entangled states and I therefore showed how the 
quantum relative entropy can be used to quantify entan- 
glement. Moreover, I used the Holevo bound to put lim- 
its on the efficiency of quantum computation by treating 
it as a communication protocol. Quantum algorithms 
were shown to be considerably more efficient for some 
problems than classical algorithms. In particular, I have 
shown in a new way that the quantum database search 
has a square root speed up over the classical database 
search. Efficiency of quantum computation stems from 
the trade-off between two opposite effects: on the one 



hand, superpositions allow us to compute in parallel, 
while, on the other hand, the Holevo bound limits the 
amount of information we can extract from a quantum 
state. I also emphasised links between the black box 
quantum computation and quantum measurement and 
I showed that there is a fundamental limit to deleting 
information, leading to Landauer's principle that 1 bit 
erased increases the environment information by fcs ln2. 

With every new physical theory comes a new under- 
standing of the world we live in. Through Newtonian 
physics we understood the Universe as a clockwork mech- 
anism. With the subsequent development of thermody- 
namics the Universe became a big Carnot engine, slowly 
evolving towards its final equilibrium state after which no 
useful work could be obtained - the heat death. Presently, 
we see Universe as an information processing machine - 
a computer. Limits to the amount of information it can 
contain and process are given by the most accurate the- 
ory we have, quantum mechanics, giving rise to quantum 
information theory. 

If there is a single moral to be drawn from the rela- 
tionship between information and physics it is that, as 
we dig deeper into the fundamental laws on physics, we 
also push back the boundaries of information process- 
ing. It will not be surprising if all the results presented 
in this review are superseded by higher level generalisa- 
tion of which they become an approximation in the same 
way that today classical information theory approximates 
quantum information theory. 
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